• Hi,

    Today I discovered something strange when I check the option to discourage search engines indexing my site.

    WordPress adds “noindex, follow” in the source of my site. But this “noindex, follow” can not be recognized by Google because on the same time the virtual robots.txt changes so that search engines can not index the site…

    Those two things can not used together. Why does WordPress behave like this?

    Kind regards,

    Willem

Viewing 4 replies - 1 through 4 (of 4 total)
  • It’s done as a fallback. Google isn’t the only search engine out there, and all of their bots work in slightly different ways, so to cover the most bases that it can, the tags are added as you’ve seen so that if one bot doesn’t recognise or honour that tag or the robots.txt file then it will most likely recognise the other method.

    Thread Starter Willem-Siebe

    (@siebje)

    Hi,

    But why as a fallback, we probably all know that robots.txt does not stop Google from showing your site in the search results, and ‘noindex’ does… but because wordpress also makes a robots.txt file nobody has the benefit of the ‘noindex’.

    Kind regards,

    Willem

    So, the robots.txt file is useless?

    https://support.google.com/webmasters/answer/156449?hl=en
    https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

    Reading through those details from Google directly it doesn’t seem that way. Your pages may still show up in search engine results, but they still may do that no matter what you do.

    If you want the site to be truely non-indexable the only way that you can guarantee it is to have it behind restricted login area – and I mean the whole site. I do this using .htpasswd files and user restrictions.

    As is stated everywhere, the robots.txt file and the noindex tags are indicators only, and don’t mean that things won’t be crawled. Having both gives you the best chance of having crawlers honour your requests.

    Thread Starter Willem-Siebe

    (@siebje)

    Hi, I did not say the robots.txt is useless, but it’s a fact that although your site is blocked with robots.txt, Google honors this and the site will not be indexed, but it still can show up in search results (which are two different things).

    Because of the fact that WordPress makes a robots.txt file for you, the ‘fallback’ “noindex” rule in the HTML can not be read by Google since the robots.txt prevents Google to do so.

    But the problem I have that the noindex is used as a fallback, because the fallback method really prevents your site being shown at the searchresults whereas the robots.txt does not prevent this. So why not skip the robots.txt at all?

    Kind regards,

    Willem

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Discourage search engines bug’ is closed to new replies.