Is it OK to convert an entire blog database to UTF-8?
-
Hello,
One of the blogs I host has an issue, however I’m not asking for support with that issue, but with its follow-up. Please allow me elaborate ??
But to sum it up real quick, the final question is to know if it’s OK to convert an entire mixed-encodings database to UTF-8 (more precisely UTF-8mb4), or not.
*
The context: I moved the sites I host from a dedi (obsolete Debian, LAMP, mysql database engine) to a new dedi (latest stable Debian buster, LAMP, but a new database engine, mariadb).
At the moment, the encoding of the problem blog’s tables is a huge mixed bag, mostly latin1_swedish_ci, with a few utf8_general_ci, here’s an ugly screenshot: https://imgur.com/a/ziyBLQ2
The blog’s issue (for which I do NOT ask for support): even though the database of the problem blog was properly exported and re-imported, retaining its encoding and collation properties, there were 2 encoding problems.
First the typical problem of special chars (such as apostrophes, etc) becoming garbage, and second of “really” special chars (such as Japanese letters) becoming “?” question marks.
This second issue proved it was not just a problem of broken import: a manual UPDATE query in PhpMyAdin would return “#1366 – Incorrect string value” followed by \xVALUES\ series.The first issue is very close to being fixed (rather than an endless series of search and replace, a whole .Sql file edit in Notepad++ will do its magic, open .sql, Encoding > Convert to ANSI, Encoding > Encode in UTF-8, even though it’s annoying, it *works* but no idea why. Still, I mention the method in case people come here from a web search in the distant future because they have the same problem.)
The second issue can, as for it, can be fixed with a phpmyadmin query:
ALTER TABLE
wp_posts
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
After the posts table is converted to UTF-8 (mb4 option, the internet says it’s even better), every question mark is back to looking like the special chars it’s meant to be, such as the Japanese letters, and you can add those characters without issues.What I’m asking for help with: I am tempted to convert the WHOLE database to UTF-8mb4, and call it a day, to avoid all future potential issues.
But, please, would you know if it’s safe to do so, and bid old latin1 encoding farewell?
Or should I only convert the strict obligatory minimum number of tables, as there would be hidden problems I’m not aware of?
*
Sorry for the long post, thanks if you have an idea and can share it!
- The topic ‘Is it OK to convert an entire blog database to UTF-8?’ is closed to new replies.