Hi again!
The robots.txt
file can prevent crawlers from accessing a page altogether. But, when your page contains a descriptive link to the page, then robots can still index them (although not necessarily crawl and process them). See https://support.google.com/webmasters/answer/6062608.
It is why we urge not altering the robots.txt file further from what TSF outputs–it’s not entirely futile, but it prevents you from directing the search engine any further.
Whereas a robots-header-or metatag allows the crawler to access the page, but it directs them to prevent indexing, following, or archiving a page completely. A “noindex” robots-header-or metatag can, therefore, remove those ?replytocom
pages from search engine indexes–much like a canonical URL could.
The SEO tool is somewhat correct in its assessment, since removing the canonical URL speeds up the directive. We have mechanisms in The SEO Framework to prevent having a canonical URL and the noindex directive being outputted alongside each other. However, conditions apply. You can read further up on this here: https://github.com/sybrew/the-seo-framework/issues/370. The description in our code reads: “If the page should not be indexed, and the permalink matches the canonical URL, empty [the canonical URL].”–this condition is never the case with ?replytocom
queries.
I hope this helps ?? Cheers!
-
This reply was modified 4 years, 9 months ago by
Sybre Waaijer. Reason: clarity