• Resolved wizard71

    (@wizard71)


    Hi,

    last several days I’m getting bombarded by the facebookexternalhit/1.1 bot that generates hundreds of requests per minute. The CPU goes almost to 100% and website becomes really slow.

    I’ve already limited the amount of page requests per minute, but it didn’t help as more and more different IPs came to the website.

    I’d like to block these IPs, since they don’t make much sense and just create unnecessary load on the server. However, the plugin doesn’t allow to blacklist any of these IPs.

    https://ibb.co/HnMWLcr
    https://ibb.co/GFHnXVQ
    https://ibb.co/KG09HBw

    Any ideas?

    The page I need help with: [log in to see the link]

Viewing 11 replies - 16 through 26 (of 26 total)
  • Hi @eugenecleantalk!

    I didn’t try the lists, I blocked Facebook for a few days and stopped the large number of scenning.

    Hi @theginaddict!

    Robots.txt doesn’t help, I disabled Facebook in BotSpy24 in the Disable Crawlers menu.

    Plugin Support eugenecleantalk

    (@eugenecleantalk)

    Hello @theginaddict,

    Previously in this topic, we recommended using personal lists to block a bot: https://cleantalk.org/help/security-firewall.

    We suggest that adding these networks to personal lists should solve the issue:

    66.220.144.0/21

    69.171.224.0/20

    173.252.64.0/19

    31.13.64.0/18 

    Hi @eugenecleantalk ,

    Thank you very much for the ips.
    I do that right now, and i will see with my host provider if it’s solve the problem.

    Plugin Support eugenecleantalk

    (@eugenecleantalk)

    @theginaddict, you’re welcome! Let us know if you need more help.

    Plugin Support eugenecleantalk

    (@eugenecleantalk)

    There’s another solution. You can install our other plugin – Anti-Spam by CleanTalk.

    There you need to enable the ‘Anti-Crawler’ option in the plugin Advanced settings (screenshot).

    And then add the FacebookBot user agent to personal lists (screenshot) according to this guide.

    I have similar problems.
    I’m looking for an effective solution and I came across an interesting discussion (although from 6 years ago) on the website: https://stackoverflow.com/questions/49577546/facebook-crawler-is-hitting-my-server-hard-and-ignoring-directives-accessing-sa
    This thread is especially worth paying attention to (but also a few others):
    I received word back from the Facebook team themselves. Hopefully, it brings some clarification to how the crawler treats image URLs.

    Here it goes:

    The Crawler treats image URLs differently than other URLs.

    We scrape images multiple times because we have different physical regions, each of which need to fetch the image. Since we have around 20 different regions, the developer should expect ~20 calls for each image. Once we make these requests, they stay in our cache for around a month – we need to rescrape these images frequently to prevent abuse on the platform (a malicious actor could get us to scrape a benign image and then replace it with an offensive one).

    So basically, you should expect that the image specified in og:image will be hit 20 times after it has been shared. Then, a month later, it will be scraped again.”

    I don’t know if Facebook’s approach to its bots is still in effect, but looking at the logs and the cyclic load on the servers, it seems that it still may be.
    And a question for those using the plugin: https://github.com/nadimtuhin/Facebook-Request-Throttle-WordPress-Plugin
    Do they have any problems with it and does it actually limit the “activity” of Facebook bots to the expected extent?

    Plugin Support eugenecleantalk

    (@eugenecleantalk)

    Hello @tripsoverpoland.

    We did not understand your request and would like to draw your attention to a previous post that states the solution to the issue: https://www.ads-software.com/support/topic/facebookexternalhit-1-1-thousands-of-requests/page/2/#post-17880083.

    I’m not interested in this particular solution – it’s too big of a plugin and probably consumes too many resources. And it’s paid, which may disqualify it for many applications.
    I have installed the plugin https://github.com/nadimtuhin/Facebook-Request-Throttle-WordPress-Plugin and will check if it is effective in combination with other solutions already implemented.

    There is a possibility of this being a bad bot or human using the facebook UA.

    The UserAgent:?facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)

    it came from multiple IPs including one in Singapore.

    I suspect there’s more going on here than a mis-configured or broken crawler.

    A few more words about the effects of blocking the FB robot and possible temporary action.

    What does such a mechanism of FB action with images published on our blog and used for publishing on FB cause on our hosting (the problem concerns many hostings both in Poland and abroad)? If you look at the logs of your blogs or other websites, you will notice characteristic lines, such as the ones below:
    31.13.115.11 – – [12/Aug/2024:13:05:55 +0200] “GET /wp-content/uploads/2016/08/16_Olsztyn_kolo_Czestochowy_246.jpg HTTP/2” 200 547895 “-” “facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)”

    • the culprit is visible at the end of this line:
      “facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)”.

    The overloads that the Facebook robot causes on hostings even lead to the suspension of the availability of pages/blogs.
    At the moment we have decided to completely block this robot, but you can see the effects of this blocking on FB now.

    At the very top of the FB page, in the featured section (i.e. in pinned blog posts published on FB), most of the title images that FB downloaded when publishing links to blog posts on FB are missing since yesterday. It’s the same with older blog posts if you scroll down our page.

    The blocking of images on FB from blogs/other websites does not apply to images/videos published directly on FB – these are visible all the time.

    Temporary solution for those publishing blog entries:
    of course you can publish on FB not only a link to the entry from which FB will download the title image, but also the title image itself, only if someone clicks on the image, they will see its enlargement, and not the source linked blog entry or the target website.
    Of course, in the description of the image on FB you can also add the content of the entry with a link to the blog/page, but this is one or several clicks too far for the reader to quickly look at the blog or website.

    We are looking for a solution of the golden mean, but the above described facebookexternalhit robot has been operating for many years and FB, probably even knowing about the hosting problems, does nothing about it.

Viewing 11 replies - 16 through 26 (of 26 total)
  • You must be logged in to reply to this topic.