Patrick Mylund Nielsen
Forum Replies Created
-
Ah, cool. And yeah, nginx + files are going to be faster than most things since it’s essentially just serving it from memory and nothing else (on Linux at least.)
I just created the group, so right now not that many. Anyone can email the list though. Feel free to post any questions here or there.
I’ve created an OCP Google group. Feel free to post there if you want to discuss optimal setups!
Pages and resources from the CDN load very fast in Stockholm at least! Good job ?? Are you using the disk page cache? Would be interesting to see the speed with it if you’re not.
I’m glad! Please let me know if you have any problems.
Of whole pages, yeah, it definitely should be. I would still recommend using memcache caching of queries and APC opcode caching for very large or very dynamic (i.e. short-expiration) sites, if only to make the priming process (after expiration) a lot faster.
It gets 10x faster simply by not having to pass a request to a PHP process. If you use a web server like nginx or Cherokee, it’s ridiculous how big of a difference it can make.
Oh, feel free to email me.
That’s great to hear!
As far what to do, I would definitely recommend making (possibly manually) a sitemapindex that just points to the sitemaps for the different sites. (The sitemapindex can point to local files too, e.g. /var/www/site/bla/sitemap.xml) OCP can scale to an arbitrary simultaneous number of priming processes, and the stress on the disk will be significantly less with one OCP process rather than, say, 10. The only limitation is memory (it keeps all of the URLs in memory at the same time), but this is the same for separate processes. This way, you can also set a single concurrency (-c)/maximum (–max) level.
It really depends on how your caching is set up. For W3C, when cached pages are in pgcache/<domain>/<page>, having just one OCP for all sites should work. If they’re in separate folders, then (at the moment, at least) you’d need separate processes.
I’d like to help out if you can’t get the whole network sped up. Right now there isn’t a separate forum, but that’s actually a pretty good idea since there’s a fairly large community of OCP users now. Maybe I’ll set something up.
PS: Try running ocp with the -v flag to see if OCP is actually finding the existing pages on the disk, or if it’s priming everything every time (it shouldn’t do that if you the -l path is correct.)
Yes, that’s the one ??
You’re very welcome!
It should be. You’re saving the cached pages in APC (which is in-memory) right now. When you save to disk, Linux takes care of caching all/the most oft-visited files in memory so it doesn’t have to read from disk every time. In most cases I’ve seen this is actually faster than APC or memcache (which has to transmit and de/encode stuff over the network.) (Also, most mail systems store each email as a file, and I can assure you there are systems that house more than just a few hundred thousand emails!)
A file cache would become an issue when you have so many pages that your filesystem runs out of space or inodes, but if you’re using a relatively modern filesystem like ext3, ReiserFS or ext4, you should be good. Also, if you reach that point, or run out of memory for APC, then you may not want to cache the rendered pages at all, but only have an object cache, since you’d be making way too many copies of the same thing when you can just cache different parts of the pages (which is what the object and query caches do.) In that scenario, you’d only need to make a sitemap for OCP that contains a reference to one of each type of page that you have on your site, e.g. a tag archive page, a post, a page, the blog index, and so on.
Let me know how it goes ??
If you are using APC object caching and on-disk page caching, then yes. If you’re saving the cached pages themselves in APC then you can omit “-l /var/www/patrickmylund.com/wp-content/w3tc/pgcache -ls _index.html”. Those instructions tell it where to look on the disk for files that are already cached, and to not make any HTTP requests for whatever pages it finds.
In my experience, the disk page cache is much, much faster than APC, memcache, or anything else, at least as long as the site is small enough that it’s feasible to save the pages to disk.
The best combo I’ve found is memcache for query caching, APC for object caching, and disk cache for the page cache.
OCP just reads the sitemap, so if your sitemap doesn’t contain a reference to the pages you want to prime, or a reference to other sitemaps that contain them (a sitemapindex), it’s not going to know what to probe.
If you’re submitting your sitemap to Google/others, it would be a good idea to put all the URLs you want to appear in search engines in your sitemaps. Most sitemap addons like Arne Brachhold’s do this automatically.
I think if you want to keep the archive pages primed and _don’t_ want them to be in your regular sitemap, the easiest solution would be to expand the sample sitemap, adding page/1, page/2, page/3…, page/40, and running OCP with the –no-warn flag.
If you use PHP with APC/memcache, category page generation should be significantly faster once you’ve primed all the (single) pages that it would show there (because it doesn’t need to look them up again), without probing the category pages specifically.
Oh. In terms of site size, I know that there are at least 3 people using OCP to prime sites with over 100,000 pages on an hourly basis, and on subsequent runs (when the pages are already cached), the feedback I’ve received is that it always runs in under half a minute.
Hi Msteel,
I’m obviously biased, but let me say I’ve never received negative feedback about Optimus Cache Prime (https://patrickmylund.com/projects/ocp/). It’s very, very fast (especially if you use the -c parameter), has a small memory overhead, and it does what it’s supposed to: crawl pages in sitemaps or sitemapindexes without fetching the whole page contents, keeping the cache primed without draining bandwidth.
If you have any problems with it, I’m more than happy to help with the setup, and I’ll release a new version in a day or so if you find a bug.
Best,
PatrickHi Sven,
No root necessary, but do you do need shell access, e.g. via “ssh” in OS X. From the shell, you can do something like:
(Copy the 32-bit or 64-bit link from the Optimus Cache Prime page depending on the architecture of your server. If you’re not sure which it is, you can try both of them–one of them will work.)
wget <link>
tar xfvz ocp-x.x-xxxxx.tar.gz
cd ocp
./ocp https://example.com/sitemap.xml
If you don’t have shell access/can’t run commands on the server, e.g. if it’s a shared hosting environment, there’s no easy way to run OCP.
If they wanted to. The only thing OCP really does is it reads the publicly available sitemap file that the site author has created, and crawls the site just like a search engine spider which has read the sitemap would, making normal GET requests for each of the URLs in the sitemap. (Basically the same as if you opened the sitemap.xml file in your browser and manually clicked every link.) Each GET request prompts the server to use whatever caching mechanism(s) it has in place, e.g. W3 Total Cache or WP-Super-Cache. OCP doesn’t decide what mechanism to use, nor how it should be used.
It wouldn’t really be possible to have verification and keep OCP in its current form, a stand-alone piece of software. Besides, if somebody wanted to use it maliciously to e.g. perform a DoS attack, there are much “better” tools that they could use–ones that don’t care about sitemaps at all.
If people receive unwanted hits from OCP, they can block the IP address of the person(s) using the tool, or block the user-agent, “Optimus Cache Prime”, itself.
Hi pnommensen,
The easiest way is probably via a shortcut:
- Download the latest Windows zip file from the OCP page
- Right-click the zip file and select “Extract All…” -> Extract
- Open the extracted folder “ocp”
- Right-click ocp.exe and click “Create shortcut”
- Call the shortcut something descriptive like “Prime example.com sitemap”
- Right-click the shortcut, click Properties, and change the target from e.g.
"F:\Downloads\ocp-2.3\ocp\ocp.exe"
to"F:\Downloads\ocp-2.3\ocp\ocp.exe" https://example.com/sitemap.xml
(it’s important that there are double quotes around the path to ocp.exe) - Click OK, and then, whenever you want to prime the site, just click the shortcut
If you want to run it on a schedule, you can add a task to run the shortcut on a certain schedule via Windows’ Task Scheduler (taskschd.msc).
There are many options you can use. (For example, by default it doesn’t show any information about what it’s doing — you could add “-v” to the target line to make it more verbose.) If you want to see all of the available options, hold down shift and right-click somewhere in the folder containing OCP, select “Open command window here”, then type “ocp.exe”
Hope that helps!