• I have a multisite WordPress installation with sub-domains, and today I got a message from Google that their crawlers are being blocked by robots.txt. But whatever I do the robots.txt file on the subdomains remains the same while the main domain’s robots.txt is fine and reacts to the solutions I’ve tried.

    Here’s what it says in the subdomains robots.txt:
    User-agent: *
    Disallow: /

    Here is what I tried:
    – made a custom robots.txt file in the root directory, this doesn’t have any effect on the sub-domains only the main domain
    – remade the network from scratch with both Softaculous and the www.ads-software.com old and new versions, without plugins/themes, new fresh database same thing
    – I even deleted the wordpress function from functions.php that makes the virtual robots.txt file
    – Tried various plugins that manipulate the robots.txt file none have any effect on sub-domains

    Whatever I do has effect on the root domain, but in the subdomain blogs it has no effect whatsoever.

    How is it possible that wordpress is still making a blocked robots.txt file even though I deleted it’s function to do such a thing and in the root directory I have a correct hand made file? Is there any other function that handles the virtual robots.txt?

    Please does anyone have any other sugestion that I didn’t try? I don’t know what else to do… This used to work without problems now all of a sudden whatever I do it has no effect.

    Thank you

Viewing 8 replies - 1 through 8 (of 8 total)
  • Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Per site, if you go into Reading Settings, there’s a checkbox for ” Discourage search engines from indexing this site”

    Is that box UN-checked on all sites?

    (The physical robots.txt file should always override the ones seen by subsites. You’re checking by going to sub.domain.com/robots.txt ? )

    Thread Starter rockeru

    (@rockeru)

    Of course it’s unchecked on all the sub-blogs. But through hours and hours of testing I’ve discovered something else.

    If I delete WordPress and my robots.txt file then go in any randomsubdomain.mysite.robots.txt it always give me the wrong blocked robots.txt file even though there is no such file in the root, and there is no more WordPress to virtually make one. That leads to the conclusion that it isn’t a WordPress issue.

    I’ve contacted my host and told them the problem, they didn’t figure it out yet. If anyone can give me a hint as to what can cause this in a hosting account it would be very helpful.

    Try using only this:
    User-agent: *
    Disallow: / may be understood by search engines as disallow all.

    Thread Starter rockeru

    (@rockeru)

    Use it how? That is a blocked robots file that I’ve been getting no matter what I do for the past days. And even if I make my own file without the / at disallow and without any wordpress in the root, I still get:
    User-agent: *
    Disallow: /
    This happens only in sub-domains on a wildcard record.

    As you have tested and reported above, that robots.txt file is not created by WordPress. Also, your host could not figure out where it is. As you have mentioned, if there is no such visible file, where does it come from? Perhaps a plugin (or theme) outputting that? The possibilities can be an SEO plugin or a security plugin.

    Thread Starter rockeru

    (@rockeru)

    I previously renamed the wp-content folder but now I tried deleting all plugins and themes but still no effect. I also deleted my whole 2nd site/wordpress installation I had in a subfolder on an addon domain, still didn’t solve anything.

    My guess is a config issue upstream. It is easy to add rules to Apache httpd (only httpd server I am familiar with) to automatically generate robots.txt.

    I would try a few things like this to rule certain things out:
    1) create a file, e.g. testfile.txt, and place it in the root directory and a subdirectory (/test). Access it from the main and the subdomains via browser. This will confirm that the httpd mappings work properly.
    2) drop the robots.txt in the root directory and a subdirectory, e.g. /test/robots.txt. Can you access the files with a browser, main and subdomains?

    If you can access the test files in the main & subdomains, but still cannot access the robots.txt in the subdomain (and test directory), then I would say it is a httpd config problem.

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Does your webhost have some sort of panel (cpanel or custom) that has an option to block things?

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘robots.txt is blocking search engines on subdomain blogs whatever I do’ is closed to new replies.