• Resolved olegan69

    (@olegan69)


    Hi!
    I cannot understand whether it is possible to make changes to the robots.txt file.
    If possible – how to do it?
    Thank you!

Viewing 6 replies - 1 through 6 (of 6 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    Took me a while; sorry about that.

    I finally found a plugin that can write to robots.txt that is compatible with The SEO Framework: https://www.ads-software.com/plugins/robots-txt-quick-editor/.

    But it would be best if you didn’t modify that file, for there are better methods to steer robots using the aptly-named robots-meta tag, canonical URLs, and redirects.

    To learn more, see https://developers.google.com/search/docs/crawling-indexing/robots/intro.
    Also, heed the warning of https://developers.google.com/search/docs/crawling-indexing/block-indexing.

    apexdigitalro

    (@apexdigitalro)

    @cybr so I have the same question, but for WooCommerce. Basically, I have bots that keep adding products to the cart. What I’d like to do is disable crawling for the add to cart function, as well as /cart/, /checkout/ etc.

    Is that possible with meta tags? I suppose it is. But what about the actual add to cart JS? /*add-to-cart=*

    Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    Bots shouldn’t follow any links blocked in robots.txt, no matter what initiates it, albeit HTML or a link inserted/acted via JS. From https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics:

    When Googlebot fetches a URL from the crawling queue by making an HTTP request, it first checks if you allow crawling. Googlebot reads the robots.txt file. If it marks the URL as disallowed, then Googlebot skips making an HTTP request to this URL and skips the URL.

    Still, exceptions apply; from https://developers.google.com/search/docs/crawling-indexing/robots/intro:

    A page that’s disallowed in robots.txt can still be indexed if linked to from other sites.
    While Google won’t crawl or index the content blocked by a robots.txt file, we might still find and index a disallowed URL if it is linked from other places on the web.

    Hence I believe robots.txt is a weak defense mechanism and shouldn’t be relied upon. However, it may alleviate server load and steer bots in the right direction until an actual block is installed.

    I recommend using nofollow on links to and noindex tags on pages you don’t wish to be crawled. Resort to a User-Agent-based WAF block if bots still land and process many checkouts, which is your best bet.

    Thread Starter olegan69

    (@olegan69)

    Yandex search engine notifies:
    If there are duplicates in the search due to GET parameters, we recommend using the Clean-param directive in robots.txt so that the robot ignores insignificant GET parameters and combines all signals from copy pages on the main page. When the robot learns of the changes made, pages with insignificant GET parameters will disappear from the search.

    Instruction Reference:
    https://yandex.ru/support/webmaster/robot-workings/clean-param.html

    Now there is no way to register Clean-param in robots.txt ((

    Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    I recommended this plugin to modify the robots.txt output, which works well with The SEO Framework: https://www.ads-software.com/plugins/robots-txt-quick-editor/.

    Are you facing issues with that plugin?

    Please note that other search engines do not honor the Clean-param directive.

    Thread Starter olegan69

    (@olegan69)

    Thank you very much for the help and time you find for each user!

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Changes to robots.txt’ is closed to new replies.