Sitemap truncates URLs and shows URLs that don’t exist
-
I seem to have an issue with the Yoast XML Sitemap. It seems that it truncates some post URLs and only shows the first “category” slug of the post URL.
This seems to lead to many 404 hits from bots trying to access these non-existent URLs, presumably after having found them in the site map. These also show up as errors in the Google Search Console (the site map has been submitted to GSC):
“Submitted URL has crawl issue”
or
“Submitted URL not found (404)”In one case, the (truncated) URL that is shown in the sitemap also causes an infinite redirect loop:
https://www.bkwine.com/reviews/wine-of-the-month/
(there’s no redirect on that URL so the loop must be created by some internal redirection function in WordPress)There are numerous URLs like this:
https://www.bkwine.com/reviews/producer-recommendations/australia-wineries/
https://www.bkwine.com/sv/reportage/vintillverkning-vinodling/
https://www.bkwine.com/reviews/producer-recommendations/australia-wineries/
(For these three, and a few more, I have created a redirect to the real archive page so they no longer return a 404.)These URLs don’t really exist on the site. They look like category archives (the slugs are all archives) but a correct category archive URL should have /category/ in it.
Looking at the site map it seems quite obvious that this is post URLs that have been truncated so that the actual post slug has been chopped off, thus showing URLs that certainly should not be in the sitemap.
I have tested this with all other plugins disabled – except Yoast SEO – and regenerating the sitemap (turning it off, clearing cache, turning it on) and the resulting sitemap still has these errors.
You can see an example, showing mostly URLs ending in the /news/ category slug, but also the /reviews/producer-recommendations/australia-wineries/ slug, in this screenshot of the XML sitemap page 4:
https://www.bkwine.com/wp-content/uploads/2020/10/sitemap.jpgYou can of course access the sitemap on https://www.bkwine.com/sitemap_index.xml
You can see that all those /news/ URLs relates to different pages since they have varying number of images.
Most of these errors seem to be on page 4 of the sitemap (it’s hard to know if it is all). Looking at the last-modified date, they are all last modified on almost the same minute. One possibility is that this could have to do with an import of posts from an older blog (on Blogger) that might have been done at that time.
But that does not explain why the Yoast site map shows a truncated URL instead of the full URL.
It is certainly concerning since it reduces the indexation of the site and also causes Search Console errors and lots of 404s.
The page I need help with: [log in to see the link]
- The topic ‘Sitemap truncates URLs and shows URLs that don’t exist’ is closed to new replies.