• Resolved philraymond

    (@philraymond)


    I read in your knowledgebase, “You can exclude posts and pages from the sitemap by applying noindex to them.”

    But I’m using canonical on some pages instead of noindex and I’d like the canonical pages excluded from the sitemap. Anything I can do? Thanks!

Viewing 2 replies - 1 through 2 (of 2 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    I wouldn’t worry too much about this. A few additional links in the sitemap won’t hurt. I kept pages with custom canonical URLs in the sitemap so the canonical URL change could be inspected more quickly, thanks to the updated “last modified” value. What I did is actually against Google’s recommendation; see (I’ll quote the important parts below) https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls.

    Don’t specify different URLs as canonical for the same page using different canonicalization techniques (for example, don’t specify one URL in a sitemap, but a different URL for that same page using rel="canonical").

    Still, search engines won’t assume a canonical URL from the sitemap if the page has a canonical URL tag. So, this is a non-issue, and Google’s giving conflicting information.

    On pages, you can specify both a blocking robots directive and the canonical URL — they are mutually exclusive. So, you could apply “noindex” to those pages, and they’ll be removed from the sitemap. But there’s a caveat: those pages (and their canonical URL) will be forgotten by search engines after a while because of this directive. Quoting Google again:

    Don’t use noindex as a means to prevent [the] selection of a canonical page. This rule is intended to exclude the page from the index, not to manage the choice of a canonical page.

    It’s still unclear and conflicting information from Google, but it is closer to reality.

    Ultimately, the best thing you can do is apply a 301 redirect to those pages. This will force search engines and users to honor the canonical version of the page. Or you could ignore the issue. Or, you could provide a list with IDs of a meta query via filter the_seo_framework_sitemap_exclude_ids, like so (untested!):

    add_filter(
    	'the_seo_framework_sitemap_exclude_ids',
    	function( $ids ) {
    		global $wpdb;
    
    		$post_ids = $wpdb->get_col( $wpdb->prepare(
    			"SELECT post_id FROM $wpdb->postmeta WHERE meta_key = %s",
    			'_genesis_canonical_uri'
    		) );
    
    		return $ids + $post_ids;
    	}
    );

    Re-save SEO settings to clear the sitemap’s cache; otherwise, the filter won’t appear to have an effect.

    Thread Starter philraymond

    (@philraymond)

    Thanks for the detailed answer. I think I’ll leave it for now. The only annoyance is that some website audit tools flag it as duplicate content. But that’s their issue, not mine.

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘How to exclude canonical pages from sitemap?’ is closed to new replies.