• Resolved febwa1976

    (@febwa1976)


    In my site stats I have recently been getting a lot of referrers with /robots.txt at the end. Some come from regular big companies but a lot come from weird places and from sites that could have no possible interest in my blog.

    It does not seem to be doing any harm but can anybody tell me what this is.

    Thanks

Viewing 12 replies - 1 through 12 (of 12 total)
  • robots.txt is used to restrict search engines. Some site out there is linking to your blog, and a crawler robot is following that link to scan your site. You can use robots.txt if you don’t want a certain section of your website indexed.

    Thread Starter febwa1976

    (@febwa1976)

    Thanks.

    I guess I am just reacting to what has really accellerated in the last two days where I have never had traffic before – like yahoo360, msn blogs, ebay forums, as well as stuff I cannot read from China?

    How does a site spider a wordpress blog? Could I disallow everything in my wordpress directory and still have the spider find the different artilces–because they are called from the database, or do I need part of the wordpress directory to be accessable to a site spider?

    chillbilly

    (@chillbilly)

    yea…totaly…you can disallow any file from the robot or even disallow robots all together.

    here is an exaple from one of my sites…there are so many things your can do with it..too much ta list here, but if ya just google robots.txt or just robots…you should find loads of info on how ta set them up just right ??

    User-agent: Mediapartners-Google*
    Disallow:
    User-agent: *
    Disallow: admin.php
    Disallow: /admin/
    Disallow: /images/
    Disallow: /includes/
    Disallow: /themes/
    Disallow: /blocks/
    Disallow: /modules/
    Disallow: /language/

    WPChina

    (@wordpresschina)

    It would be really nice to have a default robots.txt accompany all installs of WP in the future. The robots.txt can be blank, but so long as it is there, server error logs can be kept clean.

    True, it is very very simple for people to add their own, but there are many newbies who don’t know much about these things and it will relieve some stress.

    chillbilly

    (@chillbilly)

    man…it would be fairly easy ta just set up some default robot files for this stuff…then maybe a section with some robot.txt mods for different things… I meen things like disalowing some robots that are suckin back too much bandwidth and stuff like that. Mybe even a few mods for ht.access files too ta help stop hotlinkin and stuff for people that need it.

    dp76

    (@dp76)

    It’s not good idea to Disallow Mediapartners-Google*
    If you have Adsense at your site – you need to allow Mediapartners-Google*

    Can anyone suggest the right robots.txt. I want to implement this file. I have no idea ‘what to put in?’

    Well, what do you want included in search engine listings, and what not?

    Say you have three sections:

    example.com/food
    example.com/drink
    example.com/secrets

    If you wanted search engines to stay out of secrets, it would be:

    User-agent: *
    Disallow: /secrets/

    You only need it if you don’t want search engines to include stuff in their indexes. If there’s nothing like that, just upload a blank file called robots.txt to prevent error 404s.

    It needs to be in your ROOT directory, though. So: example.com/robots.txt — anywhere else and it won’t work.

    do you really need to disallow admin etc…? the robots won’t be able to login (or is that to help prevent hacks?)

    The purpose of implementing robots.txt with me is to get the search engines read it to include my blog.

    I got the idea about ‘how to restrict sites and sections?’ from the above code.

    But what is to include to let Search engines find the site at the same time not compromising sites security.

    How would the search engines be able to find your robots.txt unless they knew about your blog in the first place?!

    The only way to get search engines to visit your site is if they find a link to it somewhere – eg your username.

    Robots.txt is only for restricting certain areas.

Viewing 12 replies - 1 through 12 (of 12 total)
  • The topic ‘robots.txt’ is closed to new replies.