• this question has to do with wordpress, but it’s kinda more seo’s field, so if this is not the place to ask, please move it ??

    probably you know about robots.txt, the standard file to indicate non-abusive bots not to index some stuff

    i was wondering… shouldn’t we disallow the indexation of folders like wp-admin, wp-content or wp-includes? that would decrease bandwith consume, and the work for the server.

    or do you think it would hurt the position of the web page in search engines? any ideas? any opinions?

Viewing 9 replies - 1 through 9 (of 9 total)
  • Moderator James Huff

    (@macmanx)

    shouldn’t we disallow the indexation of folders like wp-admin, wp-content or wp-includes?

    I certainly have. Those files are for core WordPress use only. There’s no reason for them to be indexed. In all honesty, you could use robots.txt to ban everything except index.php without seeing any pagerank impact.

    Thread Starter zootropo

    (@zootropo)

    that’s what i thought. but are you sure? ??

    shouldn’t it be then a default for wordpress?

    wp-admin and wp-includes: the contents of those are probably identical for all wp blogs. The bandwidth saving is minimal when probably compared to the savings someone could make by really doing it properly.

    User-Agent: Googlebot
    Disallow: /

    That’s what you need :p

    Thread Starter zootropo

    (@zootropo)

    come on podz, i’m serious

    why is not default but instead nofollow is a default? how comes?

    and i’m not criticizing anything. i’m just curious

    Google make the rules.
    Google took nofollow – a perfectly legitimate w3c tag – and used it to try and clean up the mess that they made. No doubt Google will again try to make everyone else pay in some way when their nefarious ways screw things over. No, I’m not a fan.

    Either way – would it save bandwidth. Yes, some.
    Would the time be better served by optimising other areas of your site ? Yes.
    Does G-bot probably ignore it anyway ? Yes
    Should WP ship with or advise about robots.txt. No – that’s an end-user decision based upon knowledge.

    Moderator James Huff

    (@macmanx)

    why is not default but instead nofollow is a default? how comes?

    Now you’re talking two entirely different things here. nofollow instructs the Googlebot no to index any links in the comments. Where as, robots.txt can be altered to give the Googlebot very detailed instructions. As for robots.txt’s inclusion with WP, I agree with Podz, “Should WP ship with or advise about robots.txt. No – that’s an end-user decision based upon knowledge.”

    Does G-bot probably ignore it anyway ? Yes

    Actually, I ban all bots that ignore robots.txt and I’m still indexed by Googlebot several times per day. So, I’m inclined to say that it does not ignore robots.txt (especially since Google recommends robots.txt as a method of Googlebot control).

    Moderator James Huff

    (@macmanx)

    No, I’m not a fan.

    For those who are curious about Podz’ statement, or those who just want to read some logical arguments against Google, see:

    https://www.tamba2.org.uk/T2/archives/2005/03/23/google-steals/
    https://www.tamba2.org.uk/T2/archives/2005/03/27/more-on-google/
    https://www.tamba2.org.uk/T2/archives/2005/04/24/google-screws/

    Bot control is a massive topic, it really is.
    Head over to webmasterworld and search for “perfect .htaccess” – it’s a long and complex set of threads and off-shoots.

    Another question, re: robots, and in this case google is the search engine in question but the question may apply to other search engines as well.

    I post on my blog and the post appears on the index page. I have the options set to show the 10 most recent posts on that page. Eventually, as more posts are made, the post rolls over to page 2, page 3, etc, as more posts are added the post in question gets pushed down the list and rolls onto the 2nd 3rd 4th pages etc. Normal, fine, no problem.

    Now I’m seeing referrals from google that point to my wordpress blog pages, ie page/3/, instead of to the archived post. The person clicks on the google result and doesn’t find the article in question because time has passed and the post which was on page 3 has rolled over to page 4.

    My thought is to exclude the index and the /page/ directories, and just let the robots crawl the archives, so a search result will always point to the archived post, and not to the index or /page/ which may or may not have the post on which they are searching.

    thoughts? better ideas?

Viewing 9 replies - 1 through 9 (of 9 total)
  • The topic ‘disallow in robots.txt?’ is closed to new replies.