Hello!
Bots shouldn’t follow any links blocked in robots.txt, no matter what initiates it, albeit HTML or a link inserted/acted via JS. From https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics:
When Googlebot fetches a URL from the crawling queue by making an HTTP request, it first checks if you allow crawling. Googlebot reads the robots.txt file. If it marks the URL as disallowed, then Googlebot skips making an HTTP request to this URL and skips the URL.
Still, exceptions apply; from https://developers.google.com/search/docs/crawling-indexing/robots/intro:
A page that’s disallowed in robots.txt can still be indexed if linked to from other sites.
While Google won’t crawl or index the content blocked by a robots.txt file, we might still find and index a disallowed URL if it is linked from other places on the web.
Hence I believe robots.txt is a weak defense mechanism and shouldn’t be relied upon. However, it may alleviate server load and steer bots in the right direction until an actual block is installed.
I recommend using nofollow
on links to and noindex
tags on pages you don’t wish to be crawled. Resort to a User-Agent-based WAF block if bots still land and process many checkouts, which is your best bet.