• Can you PLEASE help define these settings from the Rate Limiting section?
    I have read the detailed descriptions which seem to sometimes contradict.

    “If a crawler’s page views exceed…”
    Is this truly PAGE based or is it REQUEST based.
    Many crawlers/bots are directly targeting/phishing for single files many per minute without going directly to a PAGE.

    “If a crawler’s pages not found (404s) exceed…”
    Again, is this PAGE-based-404 or FILE-based-404. Huge difference.
    A page can be “200” whilst included files and links may be “404”.

    “If a human’s page views exceed…”
    Is this strictly PAGE based?
    In the description, it recommends 240…but how can a human navigate even close to 240 pages in a 60secs?
    That sounds more like REQUEST/FILE based (as seen in Raw Access logs)

    “If a human’s pages not found (404s) exceed…”
    Again, is this PAGE-based-404 or FILE-based-404.
    In the description….it focuses on FILE-based-404’s which sounds different.

    “If 404s for known vulnerable URLs exceed …”
    In summary of all these settings, I see the terms used:
    – Pages
    – Requests
    – URLs
    – Files
    Maybe it’s just me but it is a bit confusing…especially when a setting says PAGE and the detailed description talks about FILES.

    Cheers

Viewing 5 replies - 1 through 5 (of 5 total)
  • I spent hours trying to figure out what was going on with Rate Limiting. The folks at Wordfence helped by I never did get to be 100% clear on it. Instead, I just set everything to block, watched it carefully for a few months while tweaking settings, and that’s that. Seems to work fine so long as I’m careful about the settings. MTN

    Thread Starter themadproducer

    (@themadproducer)

    @mountainguy2
    Thanks for your input.
    Care to share your rate limit settings?

    I have been using WF for several years but recently I have been fine tuning WF and my htaccess file to stop constant bad bots and server cpu load.

    The wording of WF is crucial because:
    – a request (as seen in a server log) is equal to 1 count
    – a page can be equal to dozens of requests
    – a url is can be either of the above

    Hi Thema, here are my settings, please don’t ask me to explain, I just experimented for a while and this seems to work for me. It’s pretty hardcore. My site is heavily managed and curated, very few bad links, for example.

    Everything set to “block it.” 48 hours.

    How should we treat Google’s crawlers , verified have unlimited access

    If anyone’s requests exceed 120 per minute

    If a crawler’s page views exceed 30 per minute

    If a crawler’s pages not found (404s) exceed 15 per minute

    If a human’s page views exceed 60 per minute

    If a human’s pages not found (404s) exceed 15 per minute

    If 404s for known vulnerable URLs exceed 3 per minute

    As for .htaccess helping with bandwidth, I’ve found that some of the worst offenders are the feed scrapers. My feed is for private individual consumption only, the scrapers are violating my terms/copyright. They do it anyway. I spend quite a bit of time blocking them by IP, which isn’t easy.

    The biggie is country blocking. It works.

    MTN

    Thread Starter themadproducer

    (@themadproducer)

    @mmaunder
    Mark…Sorry to bother you but I think this one is an easy one for you to answer.
    Can you please spare a minute and help me out? (so OP above)
    Cheers

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Pages vs Requests vs URLs vs Files – Rate Limiting’ is closed to new replies.