• Hi I have some ip ranges and host names blocked with wordfence. How can I exclude robots.txt file from that block so they can access robots.txt even if the ip range is blocked?

    There should be an exclusion for robots.txt so spiders, etc can always read it, blocked ips or not. How can I achieve this ? Is there a setting ?

Viewing 2 replies - 1 through 2 (of 2 total)
  • Plugin Support wfpeter

    (@wfpeter)

    Hi @yatgirl, thanks for your message.

    Can I just clarify what your overall use-case is for blocking bots from seeing your site, but still wanting them to see/observe robots.txt?

    As legitimate bots would observe instructions in robots.txt to crawl, or not crawl, your site, the manual IP block may be unnecessary and difficult to keep up with over time when IPs and ranges change.

    I’m more than happy to suggest some Rate Limiting settings I use. I personally prefer increasing Wordfence > All Options > Brute Force > Amount of time a user is locked out and Wordfence > All Options > Rate Limiting > How long is an IP address blocked when it breaks a rule? to days or even months, stopping problematic IPs from retrying too often and avoiding a manual blocking regime.

    I usually set these values to start with and adjust if needed: Rate Limiting Screenshot

    • If anyone’s requests exceed – 240 per minute
    • If a crawler’s page views exceed – 120 per minute
    • If a crawler’s pages not found (404s) exceed – 60 per minute
    • If a human’s page views exceed – 120 per minute
    • If a human’s pages not found (404s) exceed – 60 per minute
    • How long is an IP address blocked when it breaks a rule – 30 minutes (or more if you prefer)

    I also always set the rule to Throttle instead of Block. Throttling is generally better than blocking with crawlers because any good search engine understands what has happened if it is mistakenly blocked and your site isn’t penalized because of it.

    Thanks,
    Peter.

    Thread Starter yatgirl

    (@yatgirl)

    I am not referring to good bots. I am talking about bad bots or bots that I just dont deem necessary to access my site – ones that I have added to my robots.txt but they of course cant see my robots.txt because they are blocked by my block rule in wordfence. I also have an ip range block for amazon aws.

    I will suppose from your answer that there is no setting to allow an exclusion to robots.txt – I really think that robots.txt should be excluded and everyone have access to it.

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Blocking ips – how to exclude robots.txt’ is closed to new replies.