• Resolved xxxhoop

    (@xxxhoop)


    For some reason google cannot access my robots.txt file and thus my pages can no longer be indexed. Everything was ok until recently. I have looked for all kinds of anomalies but so far i’m completely stumped. Could someone hack the site and block google crawl.

    If so where do i look?

    The page I need help with: [log in to see the link]

Viewing 4 replies - 1 through 4 (of 4 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    When Google can’t access your robots.txt-file, they deem it as “non-directing,” which means they’re free to crawl your site.

    But, I think you’ve misinterpreted the signal Google gave you. Did you type in the robots.txt URL in the Search Console search bar at the top? Via that bar, you’ll inspect the URL as a regular page, instead of a file.

    We recently added an X-Robots-Tag: noindex header to the robots.txt file. This header still allows search engines, like Google, to read and use the file, but they won’t show that file anymore on their search result pages. It is why Google now states:
    “URL is not available to Google… (it cannot be indexed).”

    That statement does not consider it is an actual robots.txt file. It’s wrong, and it should be:
    “URL is available to Google, but not shown in Google Search (it cannot be indexed).”

    You can find these changes in our our 4.0 changelog.

    Google removed the link to inspect the robots.txt file (as intended) via their Search Console; I hope they’ll add it back again soon. For now, you can see a link in their documentation: https://support.google.com/webmasters/answer/6062598.

    > Open Robots.txt Tester

    • This reply was modified 5 years, 6 months ago by Sybre Waaijer. Reason: clarity
    Thread Starter xxxhoop

    (@xxxhoop)

    When i type in the robots.txt url as you suggested on ur google inspection i get this:
    Indexing request rejected
    During live testing, indexing issues were detected with the URL

    This is the same message i get on all posts and pages.

    When i try to test the robots.txt here https://www.google.com/webmasters/tools/robots-testing-tool?hl=en&siteUrl=https://www.smallbusinesscapital.cf/

    I GET THE FOLLOWING

    robots.txt fetch failed
    You have a robots.txt file that we are currently unable to fetch. In such cases we stop crawling your site until we get hold of a robots.txt, or fall back to the last known good robots.txt file.

    I use your plugin on other sites even 2 on the same hosting package as this site with no issues, you have an A+ SEO plugin for sure. I am just concerned why the crawl block is only happening on this particular site only..that is why i am concerned about a hacking job.

    NOTE:

    On the robots.txt Tester at the bottom, if i test any post or page on the site for googlebot to crawl i get a positive as in allowed.

    Plugin Author Sybre Waaijer

    (@cybr)

    Oh dear! I thought you faced a more common issue. Thank you for sharing those details, it helps to deduct a lot ??

    I think I found the issue: your hosting provider blocks visitors when there are too many active concurrent requests.

    Google is not a saint when it comes to this, but I’m sure they’ll retry later.
    For now, as the message states, Google has fallen back to the last known good robots.txt file; which is why your site’s still on Google.

    Just try hitting CTRL+R a few times rapidly on your site, and you’ll see this poorly written “English” message:

    This site/page has used all avaialble php / apache processes allowed on free hosting account.

    Refreshing the page once the amount of apache / php processes are reduced will cause the site to work

    We would recommend upgrading your hosting account at [some host] , premium hosting accounts have MUCH higher resources dedicated to them.

    I know you’re hosting at about $2/month (or even free?) with excellent performance. So, I don’t know if you’re willing to up on that. I advise looking at how the business grows before adding more costs.

    Why doesn’t this happen to the other sites you have at the same hosting party? I can’t tell from here, but it could be that the newer site receives a lot more traffic.

    I believe the message you faced will disappear the next time Google tries to obtain the robots.txt file. Keep me posted!

    Thread Starter xxxhoop

    (@xxxhoop)

    Thanks for your detailed help….you nailed it..

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Robots.Txt Blocked’ is closed to new replies.