• If the URL contains Unicode characters, WP Super Cache seems to get confused and not delete the cache when a comment gets posted.

    A specific example is the single-quote (‘) character. If someone copies the character from Word, it comes across as a UTF-8 URL encoded %E2%80%99 in the URL/permalink. This plugin seems to be REALLY confused. There are four directories. One directory is named as the Unicode version but there are three expanded versions: %E2%80%99, %25E2%2580%2599, and %2525E2%252580%252599. URL encoding already URL encoded strings for the win! This, of course, explains why the caching system is breaking.

    This is happening on a multisite WP 3.0.4 setup. There are several custom-built plugins that I wrote in use but shouldn’t be affecting anything involving WP Super Cache.

Viewing 15 replies - 1 through 15 (of 15 total)
  • Yeah, that’s a known bug unfortunately. I’m not sure of the security consequences of writing filenames with unicode characters to server filesystems so I haven’t tried fixing it. Nobody can give me a definitive answer to this unfortunately.

    Thread Starter bigsite

    (@bigsite)

    I’m actually the right person to ask. A couple years ago, I attempted to write my own Unicode library and learned far more about Unicode than anyone likely cares to know. I got as far as attempting a collation implementation before I realized that I was WAY in over my head and ended up deciding the best solution was to use IBM’s implementation known as “ICU”.

    Short answer: Unicode is evil. And when you want to reach as wide an audience as possible, you need to be able to target the three major OSes with server environments: Linux, Windows, Mac. The only guaranteed character set that will work on all three without issues is the printable subset of ASCII, which should also work on other similar OSes such as Solaris.

    More detail: Modern file systems do support multibyte encodings but support varies widely. Each file system has several components that interact differently based on where the file operation APIs are being run from (kernel vs. user mode vs. some shell) and how well the software running understands the character set being used – most things use Unicode these days, but other multibyte encodings are also possible, which creates a huge mess for anyone writing new software to deal with. Security-wise and for caching purposes, encoding characters being written to the file system is a pretty good way to go as long as you only do a singular encoding and keep the allowed characters to a very specific set. Or use a fast string hashing algorithm. Not sure how hashing would translate to Apache .htaccess, but something to also consider.

    By the way, I recently ran some performance tests using ApacheBench (ab) and you’ll be interested to learn that vanilla PHP as an Apache module can serve up content slightly faster than the same vanilla Apache installation serving up the same identical HTML file itself. I was a bit surprised to see those results – one would think that anything being served through PHP would be significantly slower. This observation may or may not prove useful to you.

    Thread Starter bigsite

    (@bigsite)

    Another slightly related problem: The ‘supercache’ directory contains subdirectories for ‘domain.com’, ‘domain.com.’ (extra ‘.’), and ‘www.domain.com.’.

    That’s WordPress – it answers requests for anything sent to it. You can redirect the domain in your mod_rewrite file to the right one. It’s not for a plugin to decide which is the right domain. ??

    I think for unicode characters I may encode them and serve those pages through PHP rather than through mod_rewrite… I could try hash those directories so the rewrite rules have no chance of finding them but the PHP could look them up easily.

    Thread Starter bigsite

    (@bigsite)

    Another possibly related problem: The automatic redirect from ‘www.domain.com’ to ‘domain.com’ works in all browsers EXCEPT Safari with this plugin. I get:

    “Safari can’t open the page “https://www.domain.com/”. The error was: “unknown error” (CFURLErrorDomain:303) Please choose Report Bugs to Apple from the Help menu, note the error number, and describe what you did before you saw this message.”

    Odd that. The plugin doesn’t interfere with redirects at all. ??

    I fixed the comment bug. It’s in the development version on the download page, or will be in about half an hour when it updates, or grab the svn from https://svn.wp-plugins.org/wp-super-cache/trunk

    This unicode problem is also discussed here too.

    Thread Starter bigsite

    (@bigsite)

    Hmm – application/xml shows up first in the Accept headers for Safari. text/html is usually first in most other browsers. The ordering might cause problems. I’m going to dig deeper to see if I can trace the source of the problem.

    Thread Starter bigsite

    (@bigsite)

    I’ve found the problem – WP Super Cache was installed right around the time we also dropped in a whole bunch of other upgrades, including an Apache upgrade. Running Wireshark, I get:

    GET / HTTP/1.1
    Host: https://www.domain.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4
    Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
    Accept-Language: en-US
    Accept-Encoding: gzip, deflate
    Cookie: __qca=P0-1371044966-1295985089624
    Connection: keep-alive

    HTTP/1.1 301 Moved Permanently
    Date: Tue, 25 Jan 2011 22:57:53 GMT
    Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/1.0.0c DAV/2 PHP/5.3.5
    X-Powered-By: PHP/5.3.5
    Vary: Accept-Encoding,Cookie
    X-Pingback: https://domain.com/xmlrpc.php
    Location: https://domain.com/
    Keep-Alive: timeout=15, max=500
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8

    And nothing after that – the connection terminates (FIN, ACK). This is technically an invalid response and Safari is responding appropriately. Safari is expecting a chunked reply, gets none, and errors out. Other browsers are seeing the ‘Location’ header and do a redirect regardless of the rest of the reply.

    Fixed the issue with a rule before the WP Super Cache rules, thus bypassing whatever WordPress (or the plugin or some combination) is doing wrong:

    RewriteCond %{HTTP_HOST} https://www.domain.com
    RewriteRule ^(.*)$ https://domain.com/$1 [R=301,L]

    So, definitely off-topic for the Unicode issue for this thread.

    so was the super cache the problem? i just installed wp simple cache last night and have been waiting for he sky to fall. my objective is of course site speed (i have few widgets and plugins) but don’t know if a cache will help, or if i’m better off without it. my pages load in under half a second anyway. thanks for any thoughts on this –

    horoscopes – no, his webserver caused the redirect problem. Yes, the plugin had a problem with unicode characters in the url but that’s fixed now.

    Thread Starter bigsite

    (@bigsite)

    @horoscopes – A caching plugin is absolutely necessary if you want more than a small handful of visitors to be able to view the website. Especially on shared hosting where you don’t want to get kicked off for consuming too much CPU.

    WP Super Cache is built upon WP Cache 2, the latter of which we view as a fairly stable plugin but it unfortunately hasn’t been updated since 2007 to reflect all the changes to WP 3.0. But the web server or one of the other upgrades caused the issue. I fixed it by simply bypassing WP and doing a .htaccess redirect. Which also happened to eliminate the “WP internal redirect to the root regardless of path” issue that I didn’t like, which should boost our site’s SEO a bit.

    @bigsite: Where are you located at geographically?

    I’m currently in Taiwan running a “bigsite” (pun intended) of around 4 million/visitors 20 million/pageviews a month. Sounds like we may be tackling some of the same unicode and traffic related challenges with wordpress. Love to swap notes. Couldn’t find a way to PM you here so if you are interested drop me a note {my user name here} @ hotmail.com.

    i have tried to create post with slug with ‘ , but it has been automatically deleted from slug, maybe this is so fixed in newer versions.

    Yeah, I couldn’t create a slug like that. I copied some text from an earlier post of yours that had some non ASCII characters and created this test post.

    i have tried to add ‘ character with phpmyadmin, it is in posts table in post_name field, but the post page is “not found” then.

Viewing 15 replies - 1 through 15 (of 15 total)
  • The topic ‘[Plugin: WP Super Cache] Not deleting cache on comment post’ is closed to new replies.