• Resolved billzt

    (@billzt)


    report number is:  WTOUJIQV

    No matter how many times I tried to start the crawlers manually, they always finished in a very short time and showed blue icons (missing)

    I also have a debug log file, but I don’t know how to append this file

    • This topic was modified 9 months, 3 weeks ago by billzt.
    • This topic was modified 9 months, 3 weeks ago by billzt.
Viewing 14 replies - 1 through 14 (of 14 total)
  • Plugin Support qtwrk

    (@qtwrk)

    I see 2 IPs there , xx.xxx.146.143 and xxx.xxx.44.106 , which one is the origin server ?

    make sure in General -> Server IP , it is set with correct IP

    Thread Starter billzt

    (@billzt)

    Thank you for your response.

    xxx.xxx.44.106 is the origin server IP. I have seen it in the report.

    LSCache Plugin Options
    _version = 6.0.0.1
    hash = JrfxxxxxxxxxxxxxxxxxxxYlDs3bQtI
    auto_upgrade = true
    api_key = B819xxxxxxxxxxxxxxxxDE8
    server_ip = xxx.xxx.44.106
    guest = false
    guest_optm = false
    news = true
    guest_uas = array (
    0 => 'Lighthouse',
    1 => 'GTmetrix',
    2 => 'Google',
    3 => 'Pingdom',
    4 => 'bot',
    5 => 'PTST',
    6 => 'HeadlessChrome',
    )

    xxx.xxx.146.143 is wrong. (In fact, it seems to be the IP address I used two years ago), and I haven’t found it in the report. Would you be pleased to let me know where do you notice this IP?

    • This reply was modified 9 months, 3 weeks ago by billzt.
    Plugin Support qtwrk

    (@qtwrk)

    try set crawler interval to 600 , go to sitemap setting , set drop domain to off , refresh sitemap

    run the crawler continuously twice , after first time it finishes , wait for 10 minutes and run it again , see what it shows

    Thread Starter billzt

    (@billzt)

    My sitemap contains 381 URLs in total.

    In this test, I found that every time, the first 57 URLs are in green (hit), while all others are in blue (missing)

    However, even for URLs that are shown in blue, I found the result of curl -I had shown x-litespeed-cache: hit

    It should be noted that my website is using the Cloudflare CDN, and this issue began when I setup a new server and copied all the old files in the wordpress (of course, I didn’t copy the directory where cache data located). In the old server, the crawlers work well without setting the “drop domain”

    • This reply was modified 9 months, 3 weeks ago by billzt.
    Plugin Support qtwrk

    (@qtwrk)

    curl -I won’t work as crawler , you need to mimic full chrome desktop or mobile header that includes accept-encoding, user-agent and accept header to mimic crawler’s action

    but did it work with drop domain setting ?

    Thread Starter billzt

    (@billzt)

    I’m not sure. But currently my drop domain is set to off, and the domain is included in the URLs.

    I tried to initiate the crawlers manually, but they finished quickly within 1 second.

    Start watching...
    06 Jan 2024 10:17:45     Size: 381     Crawler: #1     Position: 1     Threads: 1     Status: crawling, prepare running
    06 Jan 2024 10:18:28     Size: 381     Crawler: #1     Position: 1     Threads: 1     Status: end
    ..

    No matter how many times I manually started the crawlers, there are always the first 63 URLs in green, and the remaining 318 URLs in blue. Why?

    Plugin Support qtwrk

    (@qtwrk)

    try set drop dowmin to ON , and in general -> server IP , make sure it is correct one , then go to toolbox -> debug setting -> enable debug log , run crawler , then in “log view” tab , click “crawler log” , see what crawler received

    Thread Starter billzt

    (@billzt)

    If I set drop dowmin to ON, then all the URLs are blue (missing).

    If I set drop dowmin to OFF, then 57 URLs are green, others are blue.

    My domain is using the Cloudflare CDN, is this the reason why it behaves so strange?

    Plugin Support qtwrk

    (@qtwrk)

    it could, but when drop ON , it should bypass CF and directly connect to origin IP

    and that’s why I asked you to check crawler log , see what crawler received in response.

    Thread Starter billzt

    (@billzt)

    Well, the log is like this. All missing.

    I replaced the real IP address with xxx.xxx

    However it should be noticed that my origin server hasn’t set any SSL certificates. I just reply on CF. It means that if I closed CF, users cannot visit my URLs such as https://springwood.me/hello-world/ . Is this the reason and should I set SSL certificates on my origin server?

    01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] [Router] parsed type: crawler_force
    01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ? type=crawler_force
    01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ??? ------------async-------------start_async_handler
    01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ??? ......crawler manually ran......
    01/06/24 09:38:47.348 [xxx.xxx.44.106:28120 1 DbU] ??? Init w/ CPU cores=2
    01/06/24 09:38:47.348 [xxx.xxx.44.106:28120 1 DbU] ??? ......crawler started......
    01/06/24 09:38:47.354 [xxx.xxx.44.106:28120 1 DbU] ??? Server load: 0.69970703125
    01/06/24 09:38:47.362 [xxx.xxx.44.106:28120 1 DbU] ??? ini_get max_execution_time=30
    01/06/24 09:38:47.362 [xxx.xxx.44.106:28120 1 DbU] ??? ini_set max_execution_time=600
    01/06/24 09:38:47.363 [xxx.xxx.44.106:28120 1 DbU] ??? final max_execution_time=600
    01/06/24 09:38:47.425 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hello-world/ [ori] /hello-world/
    01/06/24 09:38:47.426 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hello-world/
    01/06/24 09:38:47.426 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/got-yongqi/ [ori] /got-yongqi/
    01/06/24 09:38:47.427 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /got-yongqi/
    01/06/24 09:38:47.428 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/2002-2002/ [ori] /2002-2002/
    01/06/24 09:38:47.428 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /2002-2002/
    01/06/24 09:38:47.435 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/qq-360/ [ori] /qq-360/
    01/06/24 09:38:47.436 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /qq-360/
    01/06/24 09:38:47.436 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/nanjing-tree/ [ori] /nanjing-tree/
    01/06/24 09:38:47.437 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /nanjing-tree/
    01/06/24 09:38:47.438 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-100days/ [ori] /blog-100days/
    01/06/24 09:38:47.438 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-100days/
    01/06/24 09:38:47.439 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/mouse-middle-button/ [ori] /mouse-middle-button/
    01/06/24 09:38:47.439 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /mouse-middle-button/
    01/06/24 09:38:47.440 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/update-ubuntu-after-spring-festival/ [ori] /update-ubuntu-after-spring-festival/
    01/06/24 09:38:47.440 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /update-ubuntu-after-spring-festival/
    01/06/24 09:38:47.441 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/mobile-j108i/ [ori] /mobile-j108i/
    01/06/24 09:38:47.442 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /mobile-j108i/
    01/06/24 09:38:47.442 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/parallel-sentences/ [ori] /parallel-sentences/
    01/06/24 09:38:47.444 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /parallel-sentences/
    01/06/24 09:38:47.444 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/ching-ming/ [ori] /ching-ming/
    01/06/24 09:38:47.445 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /ching-ming/
    01/06/24 09:38:47.446 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/tancheng-luzhou-earthquake-line/ [ori] /tancheng-luzhou-earthquake-line/
    01/06/24 09:38:47.446 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /tancheng-luzhou-earthquake-line/
    01/06/24 09:38:47.447 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/univeristy-entrance-exam-2006/ [ori] /univeristy-entrance-exam-2006/
    01/06/24 09:38:47.447 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /univeristy-entrance-exam-2006/
    01/06/24 09:38:47.448 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/biopython-dbfetch/ [ori] /biopython-dbfetch/
    01/06/24 09:38:47.448 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /biopython-dbfetch/
    01/06/24 09:38:47.449 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/qq-webqq/ [ori] /qq-webqq/
    01/06/24 09:38:47.449 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /qq-webqq/
    01/06/24 09:38:47.450 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/jian-guo-yun/ [ori] /jian-guo-yun/
    01/06/24 09:38:47.450 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /jian-guo-yun/
    01/06/24 09:38:47.451 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-suffer-hack/ [ori] /blog-suffer-hack/
    01/06/24 09:38:47.451 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-suffer-hack/
    01/06/24 09:38:47.452 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/cpan/ [ori] /cpan/
    01/06/24 09:38:47.453 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /cpan/
    01/06/24 09:38:47.453 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/tutor/ [ori] /tutor/
    01/06/24 09:38:47.454 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /tutor/
    01/06/24 09:38:47.454 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/human-phylo/ [ori] /human-phylo/
    01/06/24 09:38:47.455 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /human-phylo/
    01/06/24 09:38:47.455 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/trinityigv-blat/ [ori] /trinityigv-blat/
    01/06/24 09:38:47.456 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /trinityigv-blat/
    01/06/24 09:38:47.456 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/install-r-packages/ [ori] /install-r-packages/
    01/06/24 09:38:47.457 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /install-r-packages/
    01/06/24 09:38:47.457 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-suffer-ddos/ [ori] /blog-suffer-ddos/
    01/06/24 09:38:47.458 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-suffer-ddos/
    01/06/24 09:38:47.459 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/rss-reader-choice/ [ori] /rss-reader-choice/
    01/06/24 09:38:47.459 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /rss-reader-choice/
    01/06/24 09:38:47.460 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/codon-fold/ [ori] /codon-fold/
    01/06/24 09:38:47.460 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /codon-fold/
    01/06/24 09:38:47.461 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hotmail-breakdown/ [ori] /hotmail-breakdown/
    01/06/24 09:38:47.461 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hotmail-breakdown/
    01/06/24 09:38:47.462 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/bioinformatics-windows-pc/ [ori] /bioinformatics-windows-pc/
    01/06/24 09:38:47.462 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /bioinformatics-windows-pc/
    01/06/24 09:38:47.463 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/sun-zhencai-1/ [ori] /sun-zhencai-1/
    01/06/24 09:38:47.463 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /sun-zhencai-1/
    01/06/24 09:38:47.464 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/sun-zhencai/ [ori] /sun-zhencai/
    01/06/24 09:38:47.465 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /sun-zhencai/
    01/06/24 09:38:47.465 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/shenzhen-visa/ [ori] /shenzhen-visa/
    01/06/24 09:38:47.466 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /shenzhen-visa/
    01/06/24 09:38:47.466 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/office-tab-plugin/ [ori] /office-tab-plugin/
    01/06/24 09:38:47.467 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /office-tab-plugin/
    01/06/24 09:38:47.468 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/textbook-math/ [ori] /textbook-math/
    01/06/24 09:38:47.468 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /textbook-math/
    01/06/24 09:38:47.469 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/typhoon-shanzu/ [ori] /typhoon-shanzu/
    01/06/24 09:38:47.469 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /typhoon-shanzu/
    01/06/24 09:38:47.470 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/2018-final/ [ori] /2018-final/
    01/06/24 09:38:47.470 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /2018-final/
    01/06/24 09:38:47.471 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hongkong-qman-enquiry/ [ori] /hongkong-qman-enquiry/
    01/06/24 09:38:47.471 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hongkong-qman-enquiry/
    01/06/24 09:38:47.472 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/bing-break-down/ [ori] /bing-break-down/
    01/06/24 09:38:47.479 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /bing-break-down/
    01/06/24 09:38:47.480 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/cash-withdraw-overseas/ [ori] /cash-withdraw-overseas/
    01/06/24 09:38:47.481 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /cash-withdraw-overseas/
    01/06/24 09:38:47.482 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/wikipedia-breakdown/ [ori] /wikipedia-breakdown/
    01/06/24 09:38:47.482 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /wikipedia-breakdown/
    01/06/24 09:38:47.483 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/oligodb/ [ori] /oligodb/
    01/06/24 09:38:47.483 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /oligodb/
    01/06/24 09:38:47.484 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/centos-7-udisk-install/ [ori] /centos-7-udisk-install/
    01/06/24 09:38:47.484 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /centos-7-udisk-install/
    01/06/24 09:38:47.485 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/buy-house-shenzhen/ [ori] /buy-house-shenzhen/
    01/06/24 09:38:47.485 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /buy-house-shenzhen/
    
    Thread Starter billzt

    (@billzt)

    In the logs, all records are shown as missing.?

    However it should be noticed that my origin server hasn’t set any SSL certificates. I just reply on CF. It means that if I closed CF, users cannot visit my URLs such as https://springwood.me/hello-world/ . Is this the reason and should I set SSL certificates on my origin server?

    Plugin Support qtwrk

    (@qtwrk)

    oh , now that makes sense , yes , you will need a valid cert on your origin

    Thread Starter billzt

    (@billzt)

    Hi qtwrk, now it works. Thank you.

    So I recommend to update the document and add a note that SSL certificate on raw server is necessary even if using CF.

    Plugin Support qtwrk

    (@qtwrk)

    glad to know it works now , will advice our doc team about it

Viewing 14 replies - 1 through 14 (of 14 total)
  • The topic ‘crawler always shows blue (missing)’ is closed to new replies.