• My wordpress site is running in a directory on a company website…

    I have a standard robots.txt file that was generated by a website in the folder that my WP site resides but I’m still getting bots crawling the site – also – my posts are appearing on technorati which i dont want to happen as the site has been designed for family use and not for the wider public.

    how do i go about stopping bots crawling the WP folder?

Viewing 11 replies - 1 through 11 (of 11 total)
  • Is your site at something like example.com/subdirectory/yoursite?

    If so, robots.txt won’t work.

    Only when robots.txt is placed at example.com/robots.txt will it be read by bots, they don’t check anywhere else. You could contact your system administrator, to see if they will be willing to add something to their robots.txt, but much better to simply add the robots meta tag to your <head>:

    <meta name="ROBOTS" content="NOINDEX NOARCHIVE NOFOLLOW">

    or if you use XHTML:

    <meta name="ROBOTS" content="NOINDEX NOARCHIVE NOFOLLOW" />

    That will be inserted into every page on your site, and robots that find it will not put your page in their index, or in their cache, and neither will they follow links on your pages.

    You can also disallow crawling from within the root robots.txt document.

    So, using the abpve example, if the site you’re trying to “hide” is ina subdirectory, you can add this to your root robots.txt file:

    User-agent: *
    Disallow: /subdirectory

    This will keep all bots from crawling that particular directory. Then add that meta tag in the header of your header.php file as an extra security measure.

    Another suggestion would be to password-protect the directory, and give the family members the password. Bots can’t crawl protected pages – and if it’s only for a small group of family members, this shouldn’t be too much of a problem ??

    doodlebee: the problem here is that Parb can’t (I assume) put robots.txt at the root level. If it’s a company website, I would assume that access is only permitted on the specified subdirectory — and bots don’t check for robots.txt anywhere else but the root.

    Moderator Samuel Wood (Otto)

    (@otto42)

    www.ads-software.com Admin

    also – my posts are appearing on technorati which i dont want to happen as the site has been designed for family use and not for the wider public.

    Remove the rpc.pingomatic.com from your update services in the admin menu.

    good point, maerk. But just in case he *does* have access, it’s still a good thing to know ??

    Hi,

    given the late comment spam attack on the net I set several gallery scripts behind a .htaccess password and installed a “public access”-page, on which a gif renders the needed user and PW details to enter the galleries.

    This way comment spam has stopped completely on these scripts, however, real persons still can enter normally and can leave their comments even if not registered.

    This could also be implemented on the blog, thus opening access to a wider public of “real-person-users”.

    As to the pinging/technorati settings, I always thought that it is a very bad idea to automatically install this enabled. I’d really suggest WP devs to disable this on standard install in the future. Those who want this can enable, but noob users who don’t don’t have it and get washed with stuff from the start.

    Thread Starter parb

    (@parb)

    maerk – you’re right, the WP site is in a sub directory on a company website so it wouldn’t be possible to put the robots.txt file at the uppermost level.

    thanks to both you and doodlebee for your advice – i’ve implemented both to err on the side of caution!!! hopefully i’ll see no more bots.

    Otto42 – not entirely sure where i should be looking – i’ve hunted high and low but can’t find what you’ve mentioned in my admin pages…

    Options > Writing

    It’s in a large textbox under the heading “Update Services”

    Just delete everything there.

    Thread Starter parb

    (@parb)

    fantastic – found it.

    thanks guys – all of your advice has been bang on the money!

    robots.txt only works for “good” bots. It serves as a road map for bad bots – telling exactly where they need to go (where you Don’t want them to go). The only way I have found to deal with these bots is through bot stopping scripts and htaccess.

    This is an old post, but the other way to have done it was to get a list of bot referers and block them from .htaccess which will work in any subdirectory.

Viewing 11 replies - 1 through 11 (of 11 total)
  • The topic ‘How To Stop Bots?…’ is closed to new replies.