• Resolved Sn00z389

    (@webmakers2011)


    Hello,

    I have noticed many pages indexed by Google even when I have Disallowed them in the robots.txt file.

    Like these:

    Disallow: /*add-to-cart=*

    However Google keeps indexing those and this is a multiple website problem. Can you make the ? URLs have canonical tags properly to the current page so Google stop indexing them?

    The problem started on October 2022, before that there was no such problem.

    Having this many pages indexed decreses crawl budget and the important pages are not being properly ranked.

Viewing 6 replies - 1 through 6 (of 6 total)
  • Hey @webmakers2011,

    Thank you for using Yoast SEO and for reaching out!

    In normal scenarios, we’d expect Yoast SEO to output a self-referring canonical URL on a page/post. For instance, if you’d link to your cart page with ?add-to-cart=productid – that page should redirect to /cart/ and just have the canonical of the /cart/ page.

    Can you share a URL or screenshots of where that is not happening on your site?

    Thread Starter Sn00z389

    (@webmakers2011)

    Sure Jeroen,

    it is happening on all of the category pages/product tag pages/brand pages where products are listed.

    Attached you can see a screenshot from Google Search Console, where it says that these pages are indexed even if disallowed from Robots.txt file.

    Screenshot: https://ibb.co/cwq5j2P

    This has happened with 4 of my websites which use YOAST and it is happening from October 2022.

    Now I have other websites using the RankMath plugin and there seems to not be this problem.

    • This reply was modified 2 years, 2 months ago by Sn00z389.
    Plugin Support Maybellyne

    (@maybellyne)

    Hello @webmakers2011

    Thanks for sharing more information. First, I’ll mention that a?robots.txt?file provides?crawl directives, not?indexing directives. URLs blocked by?robots.txt?might still get indexed. Google may index a page it hasn’t crawled if there are links to it. The page ends up in the index, but Googlebot hasn’t crawled it, so Google does not know what it contains.

    Your screenshot shows that the wishlist & cart-related pages are reported in the Google Search Console as Indexed, though blocked by robots.txt. According to Google’s coverage report:

    Indexed, though blocked by robots.txt
    The page was indexed despite being blocked by your website’s?robots.txt file. Google always respects robots.txt, but this doesn’t necessarily prevent indexing?if someone else links to your page. Google won’t request and crawl the page, but we can still index it, using the information from the page that links to your blocked page. Because of the robots.txt rule, any snippet shown in Google Search results for the page will probably be very limited.
    Next steps:
    1. If you?do?want to block this page from Google Search,?robots.txt is not the correct mechanism to avoid being indexed.?To avoid being indexed,?remove?the robots.txt block?and?use ‘noindex’
    2. If you?do not?want to block this page,?update your robots.txt file to unblock your page. You can use the?robots.txt tester?to determine which rule is blocking this page.

    In summary, since you have set the wish list and cart pages to noindex , remove the crawling directive from the robots.txt file.

    I hope that helps.

    Thread Starter Sn00z389

    (@webmakers2011)

    Thank you! I have removed the directives from Robots.txt and now I am waiting to see if Google will be able to determine the right canonization of the URLs.

    I will update when I see the result.

    Plugin Support devnihil

    (@devnihil)

    @webmakers2011 Thanks for your reply and please let us know the results once Google has had time to update.

    Thread Starter Sn00z389

    (@webmakers2011)

    Hi,

    removing everything being blocked from robots.txt, except the /wp-admin/ resolved the problem.

    Google was able to crawl the URLs and set the correct canonical path. Errors in GSC disappeared!

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Canonical for add-to-cart and wishlist links’ is closed to new replies.