• allm

    (@realblueorange)


    It seems that the detection of Fake Google Crawlers is flawed.

    This is listed in the list of my blocked IPs:

    Reason: Fake Google crawler automatically blocked
    Hostname: crawl-66-249-69-96.googlebot.com

    I’ll keep a look at this if I see it more often.

    Or is there something I do not see correctly?

    https://www.ads-software.com/plugins/wordfence/

Viewing 8 replies - 1 through 8 (of 8 total)
  • Thread Starter allm

    (@realblueorange)

    Addition:

    On another site I see a lot of blocks for Google Crawlers on the IP range 66.249.66.xxx

    They all seem like legitimate Google IP’s.

    That is not good for SEO is it?

    Can you please look into this?

    Plugin Author WFMattR

    (@wfmattr)

    Hi,

    Thanks for the report. If “Immediately block fake Google crawlers” is still enabled, make sure to disable it on these sites.

    Do you know the details of how the site’s server is set up? I’ve seen one case like this, where there was a reverse proxy that was converting all IPv4 addresses to IPv4-mapped IPv6 addresses, instead of passing them through as IPv4, which could be the cause here.

    -Matt R

    Thread Starter allm

    (@realblueorange)

    Hi Matt,

    I have seen several cases where IPv4 addresses were made to look like IPv6 addresses.

    Is WordFence going to fix the check for Fake Google Crawlers for this?

    Disabling the functionality completely seems like a crude solution. Maybe something for the short term, just until things are fixed.

    Can you elaborate?

    Plugin Author WFMattR

    (@wfmattr)

    Hi,

    Yes, sorry it wasn’t clear — I meant to disable it while discussing the details, so it’s not blocking possibly legitimate traffic in the meantime!

    Can you tell me more about the server? Is it a VPS that you control, or run by a hosting company? If you control it (or know about the setup), is it using nginx or a caching proxy like Varnish, in front of Apache?

    The previous report of a similar problem are here, including the nginx option to change, in case that is the same situation..

    If it’s the same as the other case, we do have a fix planned for a future version, but I don’t know when that will be yet.

    -Matt R

    Thread Starter allm

    (@realblueorange)

    Hi Matt,

    Yes, I disabled it for the time being, so I do not get punished by Google for blocking their bot.

    I have notified the hosting company about this and the answer is still somewhere in the pipeline. It might have something to do with loadbalancers, but I’m not sure.

    I’ll wait for the hosting company and see what they come up with and report back here.

    I checked the link you provided, and will send that to the hosting company as well.

    Thanks for now.

    PS Might be an idea to have this solved by dev, because there are probably more cases in the field you don’t know about…

    Plugin Author WFMattR

    (@wfmattr)

    Hi,

    Thanks, let me know what you find out.

    If it is the same issue that I mentioned another user had, we do already have a case assigned to the dev team, to support setups where addresses show up in this format.

    -Matt R

    Thread Starter allm

    (@realblueorange)

    Hi Matt,

    The hosting company replied. They confirmed what I concluded (IPv4 addresses are mapped to IPv6 addresses) and they told me they will not change it because this seems to use less sockets, making it more efficient.

    If that is the case, I can understand why they do it and why they want to keep it that way.

    They provided me with a piece of PHP code that converts $_SERVER['REMOTE_ADDR'] to the IPv4 format in case of mapping. I need to insert that before WordFence kicks in.

    I think it is best for me to wait until WordFence fixes this issue. I’m probably not the only one out there with this problem. Just one of the first to notice and report it…

    Do you have any idea when this is fixed in WordFence? Do you want me to send you the piece of PHP that fixes the mapping?

    Plugin Author WFMattR

    (@wfmattr)

    Hi,

    Thanks for the additional details. I don’t have an estimate of when it will be implemented yet, but I notified the dev team that this may be an issue for more people. The reference number is FB1317, in case you (or other readers) need to mention it in another post.

    Do you know what web server the host is running — whether it is nginx using CGI/FCGI, or nginx as a front end proxy to Apache, Varnish in front of Apache, or something else? The other case was the first option, but if there are other setups that may have the same issue, we’ll test each one that we find.

    -Matt R

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘fake google crawler not OK?’ is closed to new replies.