[Plugin: Global Translator] Webmaster Tool Crawl Errors
-
I’ve got a growing problem on my hands with Crawler errors relating to pages not being found. Here is my config at this point.
– WordPress 2.9.2
– W3TC using Xcache
– Global Translator PluginLately I’ve been watching my webmaster tools crawl errors increase on a daily basis. URI’s showing up like “https://www.MYdomainOMITTED.nett/2009/07/pandora-one-review/&rurl=translate.google.com&lang=en&usg=ALkJrhhtYNwzqnBC5c96Tmkvo4QCXhcrmg%2F/page/2/” and then reporting the error as NOT FOUND etc.
That is what you would get if you went to my page and tried to translate one of my posts to another language. But this requested translation has not already been cached so it does a 302 redirect via the user’s browser to get a current translation. Used to do a 503, but the developer of Global Translator updated his software to choose 302 instead of 503.
If the page had in fact already been cached the URI would look like “https://www.MYdomainOMITTED.nett/it/2009/07/pandora-one-review/” (notice the /it subdomain). WordPress and the Global translator plugin would check and say, ok this has already been cached. Don’t redirect via 302 to google, instead serve the cached file already in my global translator cache folder. And everything is gravy.
However, all the crawlers accessing my site are hitting these links up like crazy it seems so my WP backend is completely unable to properly cache these pages due to the volume being consumed already by the other crawlers. So my GT (I’ll refer to Global Translator as just GT to save my time) plugin status always shows that I am now blocked for a period of time. And since it’s blocked no new cache is being created. The Search engine crawlers (including googlebot) are still hammering away at these things so they’re simply only seeing the 302 redirect and are unable to index these pages and filling my webmaster tools with errors.
At first I though i could just add those parameters and Google will simply know that it shouldn’t touch those anymore, nope still happening just as much as before. So I next ran a test to see with the “Fetch as Google Bot” tool to see what’s being seen.
First test, a new page I know has not been cached by GT
URL: https://www.MYdomainOMITTED.net/cs/2010/04/how-to-compile-source-code/
===========================================================
HTTP/1.1 302 Moved Temporarily
Date: Sat, 17 Apr 2010 21:03:06 GMT
Server: Apache
X-Powered-By: W3 Total Cache/0.8.5.2
Set-Cookie: PHPSESSID=hds7sh6oujumkehjj00cac42; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Pingback: https://www.MYdomainOMITTED.net/xmlrpc.php
Location: https://translate.google.com/translate?hl=en&sl=en&tl=cs&u=http%3A%2F%2Fwww.MYdomainOMITTED.net%2F2010%2F04%2Fhow-to-compile-source-code%2F
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8Second Test: a post I know in fact is in my cache the way it should be.
URL: https://www.MYdomainOMITTED.net/cs/what-do-you-want-from-me/
===================================================
HTTP/1.1 200 OK
Date: Sat, 17 Apr 2010 21:05:07 GMT
Server: Apache
X-Powered-By: W3 Total Cache/0.8.5.2
Set-Cookie: PHPSESSID=0sd3eras2q4eiqk2hm198f86; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Pingback: https://www.MYdomainOMITTED.net/xmlrpc.php
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
[…the rest of the html as it should be…]Here is where all hell breaks loose. This is one of the latest crawl errors in my WMtools, a 404.
URL:https://www.MYdomainOMITTED.net/2009/07/pandora-one-review/&rurl=translate.google.com&lang=en&usg=ALkJrhhtYNwzqnBC5c96Tmkvo4QCXhcrmg%2F/page/2/
===================================================
HTTP/1.1 301 Moved Permanently
Date: Sat, 17 Apr 2010 21:38:01 GMT
Server: Apache
X-Powered-By: W3 Total Cache/0.8.5.2
Set-Cookie: PHPSESSID=79e3g0to4rmjt4ogqtbu5e7; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: https://www.MYdomainOMITTED.net/pandora-one-review/&lang=en
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/htmlNow that’s a 301. Googlebot listened this time and stripped off the parameters as set by me in my settings all except for “&lang”. And now because it didn’t strip that the result is a page that doesn’t exist because “I think” Googlebot decided 3 of the 4 parameters should be stripped, but I’ll leave that one lang parameter. And because of this, the URL is now invalid and reports an error in my WMTools.
So here’s my “guess” as to what’s happening, my translated cache is pretty much now corrupted sending out invalid URL’s, Googlebot keeps hammering away, so the cache can’t fix itself and my error log is steadily rising.
The good news so far is that I am still getting new original pages indexed. But I think because of these translator errors I’m being held back in rank. What I’m really afraid of is that I’m quickly approaching a threshold where Google is just gonna say “well, too many errors stop indexing now’ish”. And or worse yet, wipe all indexed pages.
So now I’m at the point what do I do about this? I desperately want to fix the problem, I just don’t know the best way to go about it. I don’t think I want to use a nofollow and get these subdirectories dropped all together. But the cache can’t heal itself because the Bot’s keep hitting the translate links. And on top of that, parameters that shouldn’t be ignored are partially being ignored. I’ve got just over 12k errors at the moment, So I’m stuck between a rock and a hardplace.
please help!
https://www.ads-software.com/extend/plugins/global-translator/
- The topic ‘[Plugin: Global Translator] Webmaster Tool Crawl Errors’ is closed to new replies.