Serious UTF-8 related issue in wp_html_excerpt() function
-
Few hours ago, I changed my blog’s theme into P2. And few minutes ago, I have noticed that P2’s own Recent Comments does not processed UTF-8 strings correctly. See this captured image and you’ll find the Replacement character: ?. So I traced function calls, and found that the problem occurrence was initiated in
wp_html_excerpt()
function.Inside the function, mb_substr() is used to slice the string into given size. Just like this:
$str = mb_substr( $str, 0, $count );
.
My other PHP applications also usemb_substr()
, but one thing is different: I always specify encoding parameter.
So I added the parameter:$str = mb_substr( $str, 0, $count, 'UTF-8' );
. After this, all the things are green.I don’t know why WP developers omitted the parameter, but adding it also repairs this Permalink section underneath the title field in ‘Edit Post’ page. Usually I don’t touch WP built-in functions, but this is serious issue (because this time, unlike the permalink section in admin page, the broken characters are visible to public) and unwillingly I had to modify the function.
I hope to see this issue solved in next version.Addition:
I found backward compatibility code from /wp-includes/compat.php. Now I see why encoding parameter got omitted._mb_substr()
function processes only UTF-8. But I recommend to add the parameter in case of realmb_substr()
exists. Realmb_substr()
make some kind of strange behavior in some environments, as I described above.
Also, don’t forget to add the parameter on mb_strlen(), because it affects permalink abridgement on Permalink section in ‘Edit Post’ page.
- The topic ‘Serious UTF-8 related issue in wp_html_excerpt() function’ is closed to new replies.