• I’ve got a growing problem on my hands with Crawler errors relating to pages not being found. Here is my config at this point.

    – WordPress 2.9.2
    – W3TC using Xcache
    – Global Translator Plugin

    Lately I’ve been watching my webmaster tools crawl errors increase on a daily basis. URI’s showing up like “https://www.MYdomainOMITTED.nett/2009/07/pandora-one-review/&rurl=translate.google.com&lang=en&usg=ALkJrhhtYNwzqnBC5c96Tmkvo4QCXhcrmg%2F/page/2/” and then reporting the error as NOT FOUND etc.

    That is what you would get if you went to my page and tried to translate one of my posts to another language. But this requested translation has not already been cached so it does a 302 redirect via the user’s browser to get a current translation. Used to do a 503, but the developer of Global Translator updated his software to choose 302 instead of 503.

    If the page had in fact already been cached the URI would look like “https://www.MYdomainOMITTED.nett/it/2009/07/pandora-one-review/” (notice the /it subdomain). WordPress and the Global translator plugin would check and say, ok this has already been cached. Don’t redirect via 302 to google, instead serve the cached file already in my global translator cache folder. And everything is gravy.

    However, all the crawlers accessing my site are hitting these links up like crazy it seems so my WP backend is completely unable to properly cache these pages due to the volume being consumed already by the other crawlers. So my GT (I’ll refer to Global Translator as just GT to save my time) plugin status always shows that I am now blocked for a period of time. And since it’s blocked no new cache is being created. The Search engine crawlers (including googlebot) are still hammering away at these things so they’re simply only seeing the 302 redirect and are unable to index these pages and filling my webmaster tools with errors.

    At first I though i could just add those parameters and Google will simply know that it shouldn’t touch those anymore, nope still happening just as much as before. So I next ran a test to see with the “Fetch as Google Bot” tool to see what’s being seen.

    First test, a new page I know has not been cached by GT

    URL: https://www.MYdomainOMITTED.net/cs/2010/04/how-to-compile-source-code/
    ===========================================================
    HTTP/1.1 302 Moved Temporarily
    Date: Sat, 17 Apr 2010 21:03:06 GMT
    Server: Apache
    X-Powered-By: W3 Total Cache/0.8.5.2
    Set-Cookie: PHPSESSID=hds7sh6oujumkehjj00cac42; path=/
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    X-Pingback: https://www.MYdomainOMITTED.net/xmlrpc.php
    Location: https://translate.google.com/translate?hl=en&sl=en&tl=cs&u=http%3A%2F%2Fwww.MYdomainOMITTED.net%2F2010%2F04%2Fhow-to-compile-source-code%2F
    Keep-Alive: timeout=2, max=100
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8

    Second Test: a post I know in fact is in my cache the way it should be.

    URL: https://www.MYdomainOMITTED.net/cs/what-do-you-want-from-me/
    ===================================================
    HTTP/1.1 200 OK
    Date: Sat, 17 Apr 2010 21:05:07 GMT
    Server: Apache
    X-Powered-By: W3 Total Cache/0.8.5.2
    Set-Cookie: PHPSESSID=0sd3eras2q4eiqk2hm198f86; path=/
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    X-Pingback: https://www.MYdomainOMITTED.net/xmlrpc.php
    Keep-Alive: timeout=2, max=100
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8
    […the rest of the html as it should be…]

    Here is where all hell breaks loose. This is one of the latest crawl errors in my WMtools, a 404.

    URL:https://www.MYdomainOMITTED.net/2009/07/pandora-one-review/&rurl=translate.google.com&lang=en&usg=ALkJrhhtYNwzqnBC5c96Tmkvo4QCXhcrmg%2F/page/2/
    ===================================================
    HTTP/1.1 301 Moved Permanently
    Date: Sat, 17 Apr 2010 21:38:01 GMT
    Server: Apache
    X-Powered-By: W3 Total Cache/0.8.5.2
    Set-Cookie: PHPSESSID=79e3g0to4rmjt4ogqtbu5e7; path=/
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    Location: https://www.MYdomainOMITTED.net/pandora-one-review/&lang=en
    Keep-Alive: timeout=2, max=100
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html

    Now that’s a 301. Googlebot listened this time and stripped off the parameters as set by me in my settings all except for “&lang”. And now because it didn’t strip that the result is a page that doesn’t exist because “I think” Googlebot decided 3 of the 4 parameters should be stripped, but I’ll leave that one lang parameter. And because of this, the URL is now invalid and reports an error in my WMTools.

    So here’s my “guess” as to what’s happening, my translated cache is pretty much now corrupted sending out invalid URL’s, Googlebot keeps hammering away, so the cache can’t fix itself and my error log is steadily rising.

    The good news so far is that I am still getting new original pages indexed. But I think because of these translator errors I’m being held back in rank. What I’m really afraid of is that I’m quickly approaching a threshold where Google is just gonna say “well, too many errors stop indexing now’ish”. And or worse yet, wipe all indexed pages.

    So now I’m at the point what do I do about this? I desperately want to fix the problem, I just don’t know the best way to go about it. I don’t think I want to use a nofollow and get these subdirectories dropped all together. But the cache can’t heal itself because the Bot’s keep hitting the translate links. And on top of that, parameters that shouldn’t be ignored are partially being ignored. I’ve got just over 12k errors at the moment, So I’m stuck between a rock and a hardplace.

    please help!

    https://www.ads-software.com/extend/plugins/global-translator/

Viewing 3 replies - 1 through 3 (of 3 total)
  • would removing referances to those files not help ie remove the widget and remove the referance to it in site map that way your GT could catch up on the translations and bots wouldnt beable to find the files to index them as 404.

    i know its not a fix but would cut out the errors to google.

    Thread Starter schirpich

    (@schirpich)

    I thought the same thing, makes sense. However when you remove the widget I noticed this new little status message popped up in the GT Settings that said,

    You haven’t added the flags widget on your pages: adding the flags bar is mandatory in order to make Global Translator able to work correctly

    So that was a lost cause. I didn’t notice that little gem for about 3 days.

    it would be nice if some one could alter the plugin to rewrite to a different url until the chached page is ready and when its ready change it to the correct url

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: Global Translator] Webmaster Tool Crawl Errors’ is closed to new replies.