Crawlers, Non-Existent Page & Block for Accessing a Banned URL
-
I have setup TEMPORARY bans for accessing pages ending with ‘userinfo.php?uid=’ and ‘register.php’ which, I assume, is from the old site (Drupal, I think) and tons of spammers were signing up in order to post spam.
It seems that the crawlers are now searching for these pages now, though they were not looking for these when I initially started using WordFence.
I’m concerned since I have a temporary ban for these pages. It looks like crawl-66-249-75-1.googlebot.com is NOT banned for visiting these type of pages, but others, such as:
31.50.49.58.broad.wh.hb.dynamic.163data.com.cn
107-175-37-248-host.colocrossing.com
hydrogen024.a.ahrefs.comIf these legit crawlers are being banned temporarily are we going to get bumped out of rankings, search results, not revisited, etc.
Should I allow these IP addresses?
How do I find out the IPs of crawlers?
Is there a updated list of crawlers IPs and/or Hostnames?
-
In my experience, even if those are really “crawlers” you don’t need them. Also, I researched colocrossing quite a bit as I was getting a lot of attacks routed through their network. In my opinion they’re terrible, they make money off allowing bad actors to use their network, and if possible it’s best to entirely block them.
When I’m curious about an individual IP number I look it up here: https://ipindetail.com/ip-blacklist-checker/31.50.49.58.html
And here:
https://www.tcpiputils.com/browse/ip-address/31.50.49.58Sometimes it’s entirely obvious that someone is using the IP for no good and you can proceed with blockage if you’ve got the time.
All that said, trying to figure out individual crawlers is very time consuming. Me, I just keep a big list of URLs in the Wordfence “Immediately Block URLs” option. Bad crawlers (bots) go to the same tired old URLs over and over again. That’s their weak point, so we stick the knife in at that point and ban the heck out of them, for at least 2 days. Below is the list I use, you can try picking from it or just add the whole thing and see what happens. For some weird reason WordPress doesn’t have much in the way of Firewall rules that respond to attempts on the more obvious attack vector URLs, they leave it up to us.
The list is a bit of a mess, could be cleaned up for sure. Only so much time in a day. Sometimes I could have obviously used more wildcards, but I leave the URLS individualized for later reference…
/—–NOTE–remember-url-must-not-exist-on-your-installation
/—–NOTE–dots-periods-not-substituted-by-wildcard
/wp-login
/*/wp-login
/blog/wp-login.php
/*/wp-login.php
/*/*/wp-login.php
/wp-login.php*
/login.html
/login
/author/*//wp-login.php
/author/*/wp-login.php
/author/*/wp-login.php*
/*/*login=go%21&H=
/*/*/*login=go%21&H=
/secretlogin/
/administrator/*
/administrator/index.php
/administrator
/administrator/
/*/administrator/*
/admin
/admin/
/admin.php
/adminzone
/*/node/add
/node/add
/*/*/ckeditor-for-wordpress/*
/*/ckeditor-for-wordpress/*
/*/*/thecartpress/*
/*/thecartpress/*
/data/wallet.dat
/wp-content/*/*/a-a.css
/a-a.css
/wp-content/*/*/gallery-plugin.php
/gallery-plugin.php
/whitehat
/plugins/lim4wp/editor_plugin.js
/*/plugins/lim4wp/editor_plugin.js
/xerte-online/logo.png
/*/plugins/xerte-online/logo.png
/user-photo/admin.css
/*/plugins/user-photo/admin.css
/*/mac-dock-gallery/bugslist.txt
/*/*/mac-dock-gallery/bugslist.txt
/MySQLDumper
/*/*/*/destination.php
/front-end-upload/destination.php
/*/front-end-upload/destination.php
/*/*/*/readme.txt
/wp-tmp.php
/license.php
/*/license.php
/license.php*
/lic.php
/gemb.php
/nicesite.php
/sample.php
/security.php
/tmp.php
/wp-checking.php
/——-place-all-wp-config-variations-below
/wp-config
/wp-config-sample.php
/wp-config-sample.php~
/wp-config.txt
wp-config-sample.php.bak
/*/wp-config.txt
/wp-config.save
/wp-config.cfg
/*/wp-config.cfg
/wp-config.old
/wp-config.bak
/wp-config.orig
/*/wp-config.orig
/wp-config.original
/wp-config-backup.txt
/wp-config-backup.php
/wp-config.backup
/wp-config.data
/wp-config.htm
/wp-config.html
/.wp-config.php.swp
/config.php~
/config
/—–NOTE-be-sure-setup-config-file-deleted-in-wp-admin
/*/*/*/setup-config.php
/*/*/setup-config.php
/*/setup-config.php
/setup-config.php
/%23wp-config.php%23
/.wp-config.php.swp
/*/wp-installation.php
/wp-installation.php
/wsdl.php
/manager
/manager/
/manager/html
/*/*/*/*/*/upload_settings_image.php
/xsvip.php
/wp-mail.php
/sql_dump.php
/security.php
/wp/*
/wp-content/plugins/wp-photonav/wp-photonav.css
/wp-content/plugins/wp-photonav/*
/plus/Shijian.asp
/install/m7lrv.php
/admin/mazi.asp
/utility/convert/data/config.inc.php
/plus/mytag_js.php
/inc/config.asp
/images/cache.asp
/passwords.php
/SQLiteManager/*
/weki.php
/upload/uploaxsd.asp
/zx.asp
/jiuge.asp
/xyr/confings.asp
/xz.asp%3b.jpg
/readme.txt
/readme.html
/readme.php
/sjutd.txt
/ffl/error
/apps/
/js/libs/jquery/*/*/tipsy/css/tipsy.css
/themes/elastixneo/ie.css
/wp-content/plugins/dzs-videogallery/
/wp-content/plugins/mailz/
/wp-content/plugins/akismet/Sec-War.php
/pole.php
/*/showdebuginfo/serverDetails.asp
/uploadify/uploadify.css
/uploadify/uploadify.php
/*/*/*/*/*/uploadify.php
/*/*/*/*/*/upload_settings_image.php
/user/insert.page
/*/*/tinybrowser/upload_file.php
/wrecksite.aspx
/master/upload.php
/register/
/*/register
/*/*/register
/*/*/*/register
/register.php
/*/register.php
/login-register.html
/?q=user%2Fregister
/author/*/*?action=register
/wp-register.php
/inc.php
/seo-joy.cgi
/thumbopen.php
/*/shareChat.asp
/short-term-cash-*/
/*/*/|
/*/*/%7C
/*/*/*/*/jquery.ui.draggable.min.js
/explore
/*/*/*/fm.php
/*/*/fm.php
/*/upfilees.php
/upfilees.php
/*/*/wp-quick-booking-manager/*
/*/wp-quick-booking-manager/*
/*/*/logs/xml.log
/*/logs/xml.log
/*/*/*/MF_Constant.php
/*/*/MF_Constant.php
/utility/*/*
/typo3/
/*/typo3/
/—–NOTE-below-blocks-random-author-scans
/?author=2
/?author=4
/?author=5
/?author=7
/?author=50
/*/*/front-end-upload/destination.php
/*/*/*/wp-installation.php
/test.php
/cache/clean.php
/cache/clean.php*
/*/*/*/ninja_forms.php
/form.php
/.nksdjs
/*/*/Cms_Wysiwyg/directive/*/
/*/Cms_Wysiwyg/*/*/
/*/*/delete-all-comments/*
/*/*/delete-all-comments/
/xml.log
/*/xml.log
/*/*/xml.log
/*/*/*/xml.log
/*/*/*/*/xml.log
/*/*/cielo-xml.log
/wso.php.suspected
/wso.php
/c99.php
/mko.php
/tmp.php.suspected
/bubus.php
/bubus.php.suspected
/*/*/*/README_OFFICIAL.txt
/*/*/*/lgpl.txt
/product.php
/product.php/
/product.php*
/wp-content/*/smart-videos/*
/wp-content/*/zen-mobile-app-native/*
/blog/
/*/*/mobile-app-builder-by-wappress/*
/autodiscover.wildsnow.com/*/*
/*/Exchange.asmx
/bitrix
/*bitrix/
/plugins/stop-user-enumeration/
/*/changelog.txt
/*/*/changelog.txt
/*/*/*/changelog.txt
/c3843fdbd548cf7a5c0d3cf617492957.html
/wp-admin/js/wp-fullscreen.js
/layout2b.css
/——NOTE–wordfence-firewall-might-actually-block-following-miracles-never-cease
/*/revslider/*/*
/*/revslider/*/*/*
/*/*/revslider/*/*
/*/*/revslider/*/*/*/*
/*/*/Login-wall-OaWAc/*
/media/mass.php
/mscms/
/vam_rss2_info.php
/*/weathermap/editor.php
/*/*/weathermap/editor.php
/*/*/wp-dreamworkgallery/*
/*/*/wp-vertical-gallery/*
/*/*/complete-gallery-manager/*/*
/*/complete-gallery-manager/*/*
/*/*/complete-gallery-manager/
/about/xmlrpc.php
/fozi.php
/Leonas.php
/wrm.php
/*/*/wp2android-turn-wp-site-into-android-app/*
/*/*/mobile-app-builder-by-wappress/*
/*/*/zen-mobile-app-native/*
/wp-links-opml.php
/*/*/wp-property/action_hooks.php
/*/*/custom-content-type-manager/index.html
/dzsuploader/*
/*/*/index.php?php5=print(md5(wp))
/wp-layout.css
/layout2b.css
/cash.php
/*/*/*/libravatar-replace.php
/*/*/wpstorecart/*
/wp-content/*/showbiz/*/*
/wp-content/*/showbiz/*/
/editor/filemanager/connectors/uploadtest.html
/*/*/*/uploadtest.html
/*/*/uploadtest.html
/Mksfsxcb.php
/*/wp/v2/*
/*/*/website-contact-form-with-file-upload/*
/images/stories/gass.php*
/up.php*
/m.php*
/jm-ajax/upload_file/
/894613256498.php
/07545460.php
/————-following blocks access to all gzip and sql if not existing
/*.gz
/*.sql
/*/wp/v2/*/*
/*/*/rnnvhs.php- This reply was modified 7 years, 4 months ago by mountainguy2.
P.S., forgive me for stating the obvious, but it should be said that IP numbers change all the time in terms of who is using them, so permanent bans by IP number are said to be the wrong way to go about all this. Instead, yes, “temporary” bans are better. In my experience “temporary” is best in longer spans, days not hours.
On the other hand, I do some permanent IP number blocks in my .htaccess file, for example to block known site scrapers using IPs that don’t seem to change often, or for obvious attack IPs I see over and over again for weeks. Sometimes I even put those in my server firewall. (Thankfully, Wordfence began maintaining their own black list a while back, and the need for doing this was noticeably reduced, though I’m constantly surprised by what is clearly not on the Wordfence IP blacklist.)
I also use quite a bit of country blocking, which I find to be very effective at slowing down the amount of bad traffic. I base that on the ratio of legit traffic to obvious bad. This saves me from paying for more server resources, it’s purely an economic decision. Not for everyone, but worth trying. In my experience, nearly all my traffic from certain countries is obviously criminal, no brainer to block it.
MTN
Hi @learning22
Since these links were publicly accessible one day (and spam users were registered) then it’s quite possible that crawlers can find their way to these links now.One reason for the “googlebot.com” crawler not being blocked could be that Google crawlers have unlimited access to your website based on “How should we treat Google’s crawlers” option.
Finally, I never heard of these crawlers you mentioned before, and I doubt that blocking any of them would affect your website ranking in any way, but I do agree with “mountainguy2” that temporarily blocking would be your best bet, because of the fact that IPs are assigned dynamically and you might end up with blocking a legit IP after a while.
I didn’t hear about such a list regarding crawlers’ hostnames/IPs, but you can check every crawler website for such information, or watch the “Live Traffic” log as every crawler IP that visited your website (but of course not all the crawler’s IP range) must be appearing there.
Thanks.
Hello!
I hope we were successful in helping you resolve your issue with Wordfence! Since we have not heard back from you in the past 2 weeks I will now be marking this support thread as resolved. However, if we still haven’t resolved your issue please reach out to us as we would be more than happy to further assist you!
Thanks and have a great day!
Chloe
- The topic ‘Crawlers, Non-Existent Page & Block for Accessing a Banned URL’ is closed to new replies.