• Resolved dicknet

    (@dicknet)


    Hello. Having a strange problem here. Under most circumstances the plugin works beautifully. However I’m having trouble with usages spikes from avalanches of bots hitting my sites for the same page at roughly the same time.

    What you would expect is that the first such bot would get a WP-generated page. That page is cached and subsequent bots are served the cached page. I’m not sure that’s what’s happening.

    Let’s say the first bot arrives and WP starts to generate the page. Then the second bot arrives while WP is still working. Will WP go to work generating the page for the second bot? Or will the cache make him wait so it can serve him the cached page?

    TIA for your help.

    https://www.ads-software.com/extend/plugins/quick-cache/

Viewing 5 replies - 1 through 5 (of 5 total)
  • That’s very intriguing. I think you’re right that if it is in the process of caching another visitor likely initiates another caching condition, etc.

    I do notice that the /cache directory has a “lock” file which may indicate QC checks for the lock file and if it exists it tells the other process to wait. At least that is what i would do. I guess you could test this out by writing your own test code with multi threads and bang the site 10x asycnronously and see what is returned..if the footer shows the same time (down to the miliseconds — not sure if QC shows miliseconds so u may need to add that into the php file) then you know its the same page.

    Btw, you mention bots. Are these bots done by you so that you can control when they hit your sites?

    I assume they aren’t and so maybe you could just set the cache expiry seconds to a period prior (e.g. 1 hour) and have the plugin do an auto-rebuild of the file (notice it has this feature in its config panel)

    Thread Starter dicknet

    (@dicknet)

    Bots, how to explain the bots? With most of my sites, new posts get posted on Twitter by one mechanism or another. Apparently there are a large number of bots (scrapers, scammers, and some legit) that are poised waiting for new tweets with URLs in them. When they find them they all make a beeline for the URLs in question. It wouldn’t be a problem, but for the fact they all hit at once, bringing my server to its knees. I have banned many of the most egregious offenders and those that seem to offer me little or no benefit. The spikes are lower and less frequent, but they’re still there.

    But this brings me back to my original thought: if the first bot causes the page to be cached and subsequent bots are served a copy, it SHOULDN’T bring the server to its knees. So what’s really going on?

    One thing I was wondering is if there is a setting to cache a page as soon as it’s posted. So if traffic comes along shortly for that page it will already be cached. I’m not seeing it in the options, but some of the terminology is confusing and I could be missing it.

    Just checking the QC config and i do notice “Sitemap Auto-Caching”. Notice the field: “List Of Additional URLs, to ‘Auto-Cache'”. From the description it seems to suggest that when the page expires it will do a Loopback Request onto the page resulting in a new page creation on its own. So that should be your solution, assuming you set the expiry time just moments before tweeting the new url.

    Just realized you could also set up a chron job (via your Control Panel — or equiv) to delete any individual page(s) and rebuild it immediately at whatever time you want. That is what i do for one of my sites for a special case.

    Btw, you could set up a CDN to serve the cached page too so that once it does get cached it is placed on a separate (much faster) server for those concurrent hits.

    If you really want to geek out there might be a throttle-like wp plugin out there but if not it wouldnt be too difficult to write: In your header.php file you could run your own special php function that checks and stores the visitor IP. If he returns often at around the same time each day you could then throttle him and monitor the activity of a file count of how many people are accessing the page at the moment. When it drops below a certain amount then you let him proceed (or alternatively, just return back a HTTP 503 status code so he can try again later).

    Thread Starter dicknet

    (@dicknet)

    Wow, kellogg9, thanks for all the out-of-the-box ideas. Forces me to look at things differently, which is good.

    I was thinking the Sitemap Auto-Caching was something completely different (i.e., IT would make a sitemap). I can’t see a way to immediately cache a new post (and it would have to be immediate because these bots are fast). HOWEVER, most human users are going to come to the main page first, so auto-caching that should relieve some load as well as giving users faster response than they otherwise may have gotten.

    The CDN is, frankly, beyond my capabilities. I can barely keep my plain vanilla VPS together.

    The throttle idea got me thinking, though. I found a plugin called Wordfence that does a whole lot of useful stuff. For one thing it can automatically block Googlebot impersonators. It can also throttle or block users that hit the site too hard – and this is highly configurable. So I’ve installed that and am evaluating it. If it works out I’ll install it everywhere.

    So thanks for the suggestions and for making me look at this differently.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Possible race condition under heavy load’ is closed to new replies.