• I just found out that page caching results in 403 error being returned for Screaming Frog SEO crawler. I tested in google search console and same result – if Page Caching is turned on, crawlers get 403 error, if I turn it off, they load the page correctly. Strangely enough, visitors see the page correctly always.

    I have played with various options of Page Caching but none of them made any difference. In the end I found this rule in .htaccess, that apparently when processed breaks the crawlers:

    RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html" [L]

    when Page Caching is turned off either globally or for certain page, this of course is not processed so the crawler get status 200 and page correctly. But when this is processed, it results in 403.

    I have checked the path on server and if there is the cached file, it causes the 403. When I manually deleted this cached file for certain page, I got status 200 in the crawler for that page, apparently because page was loaded fresh and stored the cache, but immediately afterwards when I did second try, the cached page was already there and 403 returned again.

    Has anyone experienced the same problem? Thanks

    https://www.ads-software.com/plugins/w3-total-cache/

Viewing 2 replies - 1 through 2 (of 2 total)
  • Never heard that problem from anyone before. So you are using “Disk: Enhanced” i see. Do you by chance have Firebug? I would love to know what the Request Headers are for these 403 forbidden pages.

    Curious, what happens when you try to retrieve the .html page directly from the cache via your browser address bar. e.g. https://<domain.com>/wp-content/cache/page_enhanced/<domain.com>/<page name>/_index.html

    Do you still get a 403 forbidden? Ideally, try retrieving this page in a new private/incognito window.

    I ask because i wonder if your htaccess or server is configured in a way that relative addressing (via the RewriteRule) is a problem.

    Btw i assume you didn’t privacy protect one of the parent folders that ultimately lead up to /cache … protecting folders triggers a login request when someone tries to access a directory (or one of its children). This is one way to trigger a 403 Forbidden.

    Thread Starter Jan Zikmund

    (@verify)

    Hi Kimberly, thanks for getting back to me. Yeah I use “Disk:Enhanced”.

    Well I don’t really know what is going on. Regarding the headers, you can check yourself, I had turned page cache on, the URL is https://kiallafoods.com.au/blog/ . In browser it opens, but Screaming Frog gived 403. Same for “Fetch as Google” in Google Search Console.

    Then I tried accessing the page in cache directly: https://kiallafoods.com.au/wp-content/cache/page_enhanced/kiallafoods.com.au/blog/_index.html , and surprisingly, I could access it in Screaming Spider, but in browser it gave 404.

    I went through the directory tree and after I commented out .htaccess in /wp-content/cache/page_enhanced , I could visit the page in cache directly even in the browser. But then I reverted the .htaccess back and I can still access the cached page in browser as well, even though it didn’t work before.

    This is the content of that .htaccess just in case:
    # BEGIN W3TC Page Cache cache
    AddDefaultCharset UTF-8
    <IfModule mod_expires.c>
    ExpiresActive On
    ExpiresByType text/html M3600
    </IfModule>
    <IfModule mod_headers.c>
    Header set Pragma “public”
    Header append Cache-Control “public”
    </IfModule>
    # END W3TC Page Cache cache

    So now it seems all work fine, even the other pages that gave 403 before seem accessible in crawlers. Unfortunately I don’t have any more time to spend on this as the client is not paying it, and it seems fine for now. Still I would be curious if you had any logical explanation. As especially I am wondering if this issue my re-appear, but I guess we will just see. If it happens again, I will probably just turn of page caching and leave it as it is.

    Thanks for your time

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Page caching results in 403 in crawlers (google, Screaming frog)’ is closed to new replies.