Hello!
I wouldn’t worry too much about this. A few additional links in the sitemap won’t hurt. I kept pages with custom canonical URLs in the sitemap so the canonical URL change could be inspected more quickly, thanks to the updated “last modified” value. What I did is actually against Google’s recommendation; see (I’ll quote the important parts below) https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls.
Don’t specify different URLs as canonical for the same page using different canonicalization techniques (for example, don’t specify one URL in a sitemap, but a different URL for that same page using rel="canonical"
).
Still, search engines won’t assume a canonical URL from the sitemap if the page has a canonical URL tag. So, this is a non-issue, and Google’s giving conflicting information.
On pages, you can specify both a blocking robots directive and the canonical URL — they are mutually exclusive. So, you could apply “noindex” to those pages, and they’ll be removed from the sitemap. But there’s a caveat: those pages (and their canonical URL) will be forgotten by search engines after a while because of this directive. Quoting Google again:
Don’t use noindex
as a means to prevent [the] selection of a canonical page. This rule is intended to exclude the page from the index, not to manage the choice of a canonical page.
It’s still unclear and conflicting information from Google, but it is closer to reality.
Ultimately, the best thing you can do is apply a 301 redirect to those pages. This will force search engines and users to honor the canonical version of the page. Or you could ignore the issue. Or, you could provide a list with IDs of a meta query via filter the_seo_framework_sitemap_exclude_ids
, like so (untested!):
add_filter(
'the_seo_framework_sitemap_exclude_ids',
function( $ids ) {
global $wpdb;
$post_ids = $wpdb->get_col( $wpdb->prepare(
"SELECT post_id FROM $wpdb->postmeta WHERE meta_key = %s",
'_genesis_canonical_uri'
) );
return $ids + $post_ids;
}
);
Re-save SEO settings to clear the sitemap’s cache; otherwise, the filter won’t appear to have an effect.