• Resolved hans410947

    (@hans410947)


    Hello,

    I am trying to understand how the crawler feature works.

    I have submitted a sitemap to the crawler, and is now trying to run it manually. I have the crawler set to “off” in the general settings, as I wish to only trigger the crawler manually (to not overload my shared servers).

    After I have manually started the crawler for all 4 entries in this list:

    When i go to my website after the crawling is done, only the hompage seems to be cached.

    When i run the crawler again, it still seems like only the homepage is cached:

    How should i do to make the crawler cache all the pages in the submitted sitemap?

    Thank you!

    The page I need help with: [log in to see the link]

Viewing 10 replies - 1 through 10 (of 10 total)
  • A single launch of the crawler is not enough to generate a cache for all pages from the sitemap, because it is done in portions so as not to overload the server. For me it works like this:

    • the first launch of the crawler: the cache is generated only for homepage (I don’t know why this is happening, I see “stopped_reset” message in status window)
    • second launch of the crawler: a cache for 4-12 pages is generated
    • each subsequent launch of the crawler: a cache for 4-12 pages is generated
    Thread Starter hans410947

    (@hans410947)

    @pepe80 thank you!

    Thread Starter hans410947

    (@hans410947)

    There must be something here that I do not understand.

    If an URL on the Map tab of the Crawler is marked with a blue circle (=cache miss/404), should this not mean that this URL has now been cached by the Crawler?

    Every URL of my site, except the homepage, are marked with blue circles, but none of those URL seems to be cached when I check the bottom of the source code for that page.

    But as soon as I visit the URL manually, then the source code will tell me that the page has been cached.

    Thanks for any help.

    Plugin Support qtwrk

    (@qtwrk)

    please provide the report number

    you can get it in toolbox -> report -> click “send to LiteSpeed”

    Thread Starter hans410947

    (@hans410947)

    Great, thank you,

    Report number: RCOOSVIQ

    Report date: 06/28/2023 06:33:49

    Plugin Support qtwrk

    (@qtwrk)

    is *.*.216.78 your server IP ?

    and please try set Crawler Interval and Run interval to lower time , like 61 for test purpose

    once you set them to 61 , try to manually start it (you may need to click it few times ) , after crawler finished all pages, if you immediately open any of these page, does it give you hit ?

    Thread Starter hans410947

    (@hans410947)

    Hi,

    yes, that is my IP address according to my Web Hosting. But when I go to General Settings > Server IP, and click on the “check my public IP from DoAPlus” link, I get an address in a different format, ending with *:*:8:5::a?

    Should I be using that address instead of the one I am currently using in LiteSpeed plugin?

    I have now done a Purge All, and then ran the Crawler as you suggested. There are a total of 24 crawlers, and here is how the first ones look after crawling completed:

    All the other URL:s look the same, meaning that only the homepage has green circles, all other pages has blue circles.

    When I then go to https://check.lscache.io/ and check some of the pages that has blue circles, this is what happens when I enter an URL and click on “Check” button:

    After I click on the Check button for the first time:

    If I click the Check button once again for the same URL this happens:

    And if I click the Check button a third time for the same URL this happens:

    These are my conclusions:

    When the crawler has marked the URL with a blue circle, that URL is still not cached.

    When I click on the Check button the second time in the tool above, the page is getting cached, and when I click the Check button the third time, the page is getting cached in QuicCloud CDN.

    Is this correct?

    How should I do to make the Crawler cache all the pages it visits, both in the cache plugin and in the CDN?

    Thank you!

    Plugin Support qtwrk

    (@qtwrk)

    in server IP field, please keep it to *.*.216.78

    if you have multiple crawlers , you will need to wait for all the crawler to finish , if you have large page list , then you use one of the sub-sitemap to test , like you can only crawler 10 or 20 pages, and then verify on them first

    the checker site is not sending request as real browser or crawler does , so please use a real browser to check crawler result.

    Thread Starter hans410947

    (@hans410947)

    Thank you @qtwrk for helping.

    I added a smaller sitemap and ran the crawler.

    Every page that was crawled got a blue circle on the map tab (except for one version of the homepage, which got a green circle).

    When I visit any of the blue-circle pages in a private web browser, it shows a miss for both cache and Quic Cloud.

    Please see images below for one of the pages I tested (the “butik” page):

    Plugin Support qtwrk

    (@qtwrk)

    I might have couple theories about the case, but still need further check

    please create a ticket by mail to support at litespeedtech.com

    with reference link to this topic , we will investigate further.

Viewing 10 replies - 1 through 10 (of 10 total)
  • The topic ‘Crawling only works for homepage’ is closed to new replies.