• Resolved pepe80

    (@pepe80)


    Hi,
    I see many entries in the access log file:

    
    2a03:2880:10ff:16::face:b00c - - [18/Sep/2020:13:39:26 +0200] "GET /some-page/ HTTP/1.1" 200 19153 "-" "facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)"
    

    Why LiteSpeed doesn’t create cache for “/some-page/”? When I type this address (a few minutes later) in the browser I have to wait a few seconds (cache is not working). When I refresh the page again (or I will run it on another computer), it appears immediately (cache works).

    Another example – Slack’s bot:

    
    3.89.134.152 - - [18/Sep/2020:15:01:05 +0200] "GET /site-test/ HTTP/1.1" 200 15155 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)"
    

    Again, LiteSpeed doesn’t create a cache for this site. When I get a page using wget command:

    
    wget https://mysite.com/site-test/
    

    …again, LiteSpeed doesn’t create a cache for this site. I have checked all the options in the plugin and can’t find anything (“Do Not Cache User Agents” -> this field is empty).

    Wordpress: 5.5.1
    LiteSpeed Cache: 3.4.2

Viewing 7 replies - 1 through 7 (of 7 total)
  • Plugin Support qtwrk

    (@qtwrk)

    Hi,

    You may need to use a full browser-like header in order to properly generate the cache

    e.g.

    curl 'https://www.ads-software.com/support/topic/why-facebooks-crawler-doesnt-create-a-cache/' \
      -H 'authority: www.ads-software.com' \
      -H 'pragma: no-cache' \
      -H 'cache-control: no-cache' \
      -H 'upgrade-insecure-requests: 1' \
      -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36' \
      -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
      -H 'sec-fetch-site: same-origin' \
      -H 'sec-fetch-mode: navigate' \
      -H 'sec-fetch-user: ?1' \
      -H 'sec-fetch-dest: document' \
      -H 'referer: https://www.ads-software.com/support/plugin/litespeed-cache/unresolved/' \
      -H 'accept-language: en,es;q=0.8,es-ES;q=0.7,es;q=0.6,en-US;q=0.5' \
      -H 'cookie: xxx' \
      --compressed

    Best regards,

    Thread Starter pepe80

    (@pepe80)

    Now I can see that it’s creating a cache after all, but separate for “wget” and separate for the “browser”:

    1. wget https://mysite.com/site-test/
    timestamp: <!– Page generated by LiteSpeed Cache 3.4.2 on 2020-09-18 17:31:38 –>

    2. browser: https://mysite.com/site-test/
    timestamp: <!– Page generated by LiteSpeed Cache 3.4.2 on 2020-09-18 17:33:11 –>

    3. wget https://mysite.com/site-test/
    timestamp: <!– Page generated by LiteSpeed Cache 3.4.2 on 2020-09-18 17:31:38 –>

    Why is that? ??

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    If you are using Chrome and you have enabled webp replacement , then it will create 2 caches for same page , 1 version for webp , 1 version for non-webp

    Best regards,

    Thread Starter pepe80

    (@pepe80)

    Thank you for the quick reply. Let’s reverse the situation ??

    1. clear cache.

    2. browser firefox: https://mysite.com/site-test/
    Page generated by LiteSpeed Cache 3.4.2 on 2020-09-18 18:02:07

    3. wget https://mysite.com/site-test/
    Page generated by LiteSpeed Cache 3.4.2 on 2020-09-18 18:04:15

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    yeah , that was actually a tricky question

    and I managed to reproduced on my test site

    upon further check on server log , I believe the cause is compression , FF goes with brotli compression , while wget didn’t accept any compression

    Best regards,

    Thread Starter pepe80

    (@pepe80)

    Thank you for your explanation.
    Is there any way to make sure that facebook’s crawler is downloading a cached page?

    2a03:2880:10ff:16::face:b00c – – [18/Sep/2020:13:39:26 +0200] “GET /some-page/ HTTP/1.1” 200 19153 “-” “facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)”

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    Can you manually trigger the FB crawler ?

    if so , you can try use curl or wget to mimic its UA , and see if it works

    Best regards,

Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘Why Facebook’s crawler doesn’t create a cache?’ is closed to new replies.