Viewing 8 replies - 1 through 8 (of 8 total)
  • Same problem here. Using Polylang and WP 3.9.1.

    @loewenherz – sorry I missed your post before. Are you still having issues? The sitemap (with or without slash, preferably without) looks good… It might have been a caching issue.

    @paralyys – can you share a link?

    Hey:
    Webmaster tools screenshot.

    the culprit was “hardcoded” robots.txt :

    sitemap: https://xxx/sitemap.xml
    User-agent:  *
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/
    Disallow: /archives/
    disallow: /*?*
    Disallow: *?replytocom
    Disallow: /wp-*
    Disallow: /author
    Disallow: /comments/feed/
    User-agent: Mediapartners-Google*
    Allow: /

    But now I have a next problem. the WP generated robots.txt is:

    # XML Sitemap & Google News Feeds version 4.3.2 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
    Sitemap: https://xxx.xxx.com/sitemap.xml
    
    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: */xmlrpc.php
    Disallow: */wp-*.php
    Disallow: */trackback/
    Disallow: *?wptheme=
    Disallow: *?comments=
    Disallow: *?replytocom
    Disallow: */comment-page-
    Disallow: *?s=
    Disallow: */wp-content/
    Allow: */wp-content/uploads/

    and webmaster tools says there are “Sitemap contains urls which are blocked by robots.txt” – screenshot

    tried adding

    Allow: */sitemap-home.xml
    Allow: */sitemap-posttype-page.xml
    Allow: */sitemap-posttype-post.xml

    to robots.txt through settings but no dice.

    Sorry for the secrecy and thanks for the help.

    @paralyys – the rules you added should explicitly allow access to these sitemaps. I cannot see why access would be blocked via the current robots.txt rules. Are you sure the old robots.txt is not cached somewhere like a server cache? Sometimes, you as logged in user see something different from anonymous requests. You can use an excellent tool like https://web-sniffer.net (with the option “Raw” enabled) to see what google bot would see. Also, in your Webmasters Tools you can find ‘Fetch as Google’. Use this to try to test the robots.txt and different sitemaps…

    If all else fails, you can contact me directly on https://status301.net/contact-en/ to send me the URL of your site privately.

    By the way, the static robots.txt does not look very different from the dynamic one… It still does not explain why your sitemap urls would be blocked.

    Hey: here’s the google fetched sitemap.xml from webmaster tools https://pastebin.com/KtDtMZsa and robots.txt https://pastebin.com/Tx3LuaZE

    (I just didn’t want to leave the domain up here, the pastebins will decay in a week)
    Maybe the subdomain setup is to blame? I’m really a bit lost here.

    No, the subdomain is no problem. But I cannot figure out what is…

    Funny thing: the first time I tried to access your sitemap, I got redirected to the English about page. Only after accessing the Estonian pages, I could visit the sitemap. You can reproduce this issue by testing your /sitemap.xml via https://web-sniffer.net (for example) where you can see the response is:

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>302 Found</title>
    </head><body>
    <h1>Found</h1>
    <p>The document has moved <a href="https://xxx.xxx.xx">here</a>.</p>
    <hr>
    <address>Apache / DataZone Server at xxx.xxx.xx Port 80</address>
    </body></html>

    instead of the requested sitemap…

    I wonder if it is a particular setting in Polylang or if you set up a redirect manually? Or is it maybe the fact that there are NO posts in the English language? WordPress is known to behave badly (returning 404 on feeds for example) when there are no posts.

    Try disabling the language slug in post/page URLs and make the home page URL default to the / and /en/ locations. And maybe test the auto-detect visitor language option. Let me know if/when that changes anything ??

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘Google Webmastertools: News Sitemap is HTML’ is closed to new replies.