• on 1/4/15, I switched web hosts. After the dust settled a bit, I was checking my webmasters accounts. Out of 6 WP installs all of them running wordfence 4 were not able to be retrieved by googlebot. Of the two that were ok, one was running the falcon engine and googlebot could get the robots.txt and the / base ok.

    Out of the 4 that could not be retrieved,I had the falcon engine running on my main site but not on the 4 addon domains. After literally hours of testing, I finally removed the falcon engine from the main site and all four sites can now be retrieved by googlebot. Please do not ask me to explain this because I can’t. I did see that the blocking had been going on since about 11/24/14 which is about when I enabled falcon on my main site. I hope someone can explain this because falcon is by far the fastest caching engine I have tested for wordpress.

    I have the code saved that was removed and will look at it after some sleep.

    https://www.ads-software.com/plugins/wordfence/

Viewing 2 replies - 1 through 2 (of 2 total)
  • If you are uncomfortable posting it here, you can email it to me at tim [at] wordfence.com. The only reason we might be blocking google is if country blocking for the us was enabled and if you set the firewall options wrong. HI haven’t ever seen it happen otherwise. If there were any specific errors google reported, or errors in your logs, could we see those too?

    Thanks and hopefully we can get this sorted out.

    tim

    Thread Starter flyfisher842

    (@flyfisher842)

    Not at all uncomfortable. The problem was caused by my using the advanced blocking User Agent function to block bot* and one identified by crawl on my main site. Then I copied those over to 3 other wordpress sites.

    When the falcon engine is enabled, all the IP and other advanced blocks get inserted onto the .htaccess file at the head. The User Agent blocks use the env module to set a filter for whatever names were in the User Agent advanced block slot. In my case bot* and crawl.

    Webmaster tools could not fetch as Google with that env filter on my main .htaccess file. Immediately upon removal from the main .htaccess file, every one of the 4 blocked sites was accessible to fetch as Google. This leads me to believe that Google is using a bot with Agent of bot* or crawl to fetch as Google and to crawl sites without being identified as googlebot or any other googlebot derivative. I am going for bot* since I have seen it crawl a lot of my sites over the years and crawl a lot of page volume.

    If what I suspect is true, I suspect Google compares the two crawls to see if manipulation is going on.

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘falcon engine blocking googlebot and msnbot’ is closed to new replies.