• Resolved navienavnav

    (@navienavnav)


    Hello. I am sorry if i am bringing this topic up on but i did the research before posting a new post. Google is indexing the wordpress folders and its contents as well and giving weird “fatal error” php links in the search engine. I learnt about robots.txt and placed it in my site /public_html folder but still no help. Now i’ll tell the details.

    First of all I’d be REALLY glad if someone could verify the content of the robots.txt file that I’ve written.Please note that my wordpress folders are not in /public_html folder but in /public_html/blog/.

    Content of the robots.txt file:
    ———————————–

    User-agent: *
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/
    Disallow: /blog/wp-content/

    Allow: /

    ———————————–

    I also read somewhere that the robots.txt file should be read as type : text/plain rather than text/html. TO ensure that i added an .htaccess file to my public_html folder which had the following content:

    AddType text/plain .txt
    php_value auto_append_file none
    php_value auto_prepend_file none

    On my google webmasters account, it shows the following after downloading my robots.txt file:

    Allowed by line 6: Allow: /
    Detected as a directory; specific files may have different restrictions

    is there a problem? or the robots.txt file is okay? Please help me. Google is still showing crap in its search. And one more thing, on the google webmasters dashboard, for every everything it’s showing “no data available”.

Viewing 7 replies - 1 through 7 (of 7 total)
  • Moderator James Huff

    (@macmanx)

    Your problem is the Allow: / line. You’re basically telling all robots that they can’t index wp-admin, wp-includes, and wp-content, but they are allowed to index everything regardless. It’s a contradiction, and when a robot encounters a contradiction within a robots.txt file, it ignores it completely.

    As for the “Content of the robots.txt file” heading in the file, keep the entire file as simple as possible.

    Try this instead:

    User-agent: *
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/
    Disallow: /blog/wp-content/
    Thread Starter navienavnav

    (@navienavnav)

    thank you for your reply but will the above code allow the bots to index things other than the specified folders? Are you sure that everything won’t be blocked?

    Moderator James Huff

    (@macmanx)

    Yes, the robots.txt code that I provided will only block robots from visiting the specified directories and anything bellow those directories, like /wp-content/plugins/. They will visit and index everything else.

    Thread Starter navienavnav

    (@navienavnav)

    hello. thanks for your response. I think it’s working fine now. I took your advice and used your robots.txt content. The old entries that were being shown in google search; i removed them using their URL remover tool in webmasters. It has accepted the robots.txt file and yet shown no errors. Let’s hope when google spider crawls my site again, it doesn’t index everything that should be blocked by robots.txt. I guess then is the only time when we can actually find out if robots.txt is doing it’s job.

    Thanks for your really fast and really helpful guidance.

    Navneet Khare.

    Moderator James Huff

    (@macmanx)

    You’re welcome!

    I have a similar problem I have google indexing my posts 3 times,
    1-the post itself, example: https://www.recipetrezor.com/oregano-marinade-recipe/
    2-then (register) https://www.recipetrezor.com/oregano-marinade-recipe/?action=register

    3-and (lostpassword) https://www.recipetrezor.com/oregano-marinade-recipe/?action=lostpassword

    I have the following in my robots.txt file

    User-agent: *
    Disallow: /?*
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-login/
    Disallow: /wp-content/themes/
    Disallow: /wp-content/cache/
    Disallow: /trackback/
    Disallow: /wp-admin/
    Disallow: /comments/

    could someone shed some light on this, or is it anything I sould worry about at all?

    thank you,

    Hi on a similar note, I was reading how duplicate content can cause google ranking probs. So I created a Robots.txt in my root; installed the XML Sitemap pluggin (without the virtual Robots.txt option); and checked to see if it was ok in my Google Webmaster Tools account. It was, the Sitemap.xml had a green tick (that’s the good part). However, when I use the Crawler Access (in google Webmaster tools) to test the Robots.txt file in get the following:

    Allow
    Detected as a directory; specific files may have different restrictions

    What does this mean?

    I have no PHP experience; am new to WordPress; and would appreciate any help. I chose Worpress on recommendation form a friend. The site is a new simple site for a non-profit /charitable org. I would like it to rank higher in Google. It has loads of well thought out, original content. Any help would be appreciated.

Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘Google Indexing WordPress files and folders. Even after placing robots.txt’ is closed to new replies.