BLC 2.0 false positives when issued a CloudFlare Challenge page during scan
-
Broken Link Checker 2.0 looks like a promising improvement over the old local version, and I see that the generic UA used in older versions has been replaced with a unique BLC UA. This is an improvement but seems like it doesn’t quite solve the original problem in my previous post, though it gets partway there. That issue and the lack of a clear timeline on BLC 2.0 was significant enough that I moved all sites that I manage away from Broken Link Checker to a paid solution elsewhere.
I am now re-assessing BLC 2.0 for use on a network of sites, however, I’m currently running into a problem with third-party sites using CloudFlare when linked from my site. Please note that I cannot change their CloudFlare settings, as I do not own those sites (one example is Columbia University’s website, https://www.columbia.edu, used in the sample request below). This is still a problem that will require action on the part of the WPMU Dev team in order to resolve — see below for the steps to resolve.
Steps to Reproduce
When I test a simulated BLC 2.0 request using the UA and X-Forwarded-For header (see below), as described by Patrick Freitas in a previous support topic, I get the following response:
Request:
curl --header "X-Forawarded-For: 165.227.127.103" -I -A "WPMU DEV Broken Link Checker Spider" "https://www.columbia.edu/"
Response:
https://www.columbia.edu/"
HTTP/2 403
date: Tue, 13 Aug 2024 16:23:57 GMT
content-type: text/html; charset=UTF-8
...
cf-mitigated: challenge
...
server: cloudflare
cf-ray: 8b2a0d79697419b2-EWRThis shows that the request, using the forwarding and UA info recommended, is being blocked by a CloudFlare challenge page.
I’ve reviewed the CloudFlare list of verified bots, and it looks like WPMU Dev has not yet added BLC 2.0 to that list, though I do see a potential competitor in there ;).
Requested steps to resolve
Please look into the process to add the
WPMU DEV Broken Link Checker Spider
UA as a verified bot on cloudflare, and adjusting the BLC 2.0 cloud scanner to recognize CloudFlare challenges and represent them with some more meaningful information in the scan results. The development team my find it helpful to note that CloudFlare challenges can be recognized by the presence of thecf-mitigated: challenge
header in the response.If this is not something that the WPMU Dev team can resolve, please let me know the steps you recommend to deal with this. Again, these are outgoing links to third-party sites using CloudFlare, and I do not control those third-party sites or their CloudFlare settings, and thus I cannot change the CloudFlare settings or add any UA or IP addresses to a whitelist, so please don’t recommend that as a solution.
Looking forward to your response,
David
- You must be logged in to reply to this topic.