Trouble converting database containing special characters to UTF-8
-
Summary:
I have a WordPress database that is filled with special characters encoded in different ways, and this is interfering with my attempts to convert it all to UTF-8.
Technical Specs:
WordPress version: 4.1.1
Hosted at NearlyFreeSpeech
Backstory:
I have a WordPress blog, currently running WordPress 4.1.1. I’ve had the same blog for close to ten years. A while ago–unfortunately, I can’t remember when–I noticed that old posts were filled with weird character strings wherever I had a special character. For instance, I saw na?ˉve instead of na?ve. I corrected these glitches when I saw them, opening the post in the editor and replacing ?ˉ with ?.
When I saved the post, the new special character was displaying instead of the original garbled mess. I was happy.
I had some old posts that contained Japanese text, and the Japanese text was goofy here, too. Instead of 日本語, it would display ?—¥???èa?. Only, when I tried to correct that in the post editor, the Japanese characters I entered would be replaced entirely with question marks upon hitting “save” or “publish”.
I didn’t replace all of those garbled strings, though; I still have some posts that show na?ˉve instead of na?ve (for example). This means that my blog currently contains:
- mis-encoded special characters (na?ˉve)
- “proper” special characters (na?ve)
- mis-encoded Japanese (?—¥???èa?)
- question marks where I attempted to correct the mis-encoded Japanese (???)
What I want:
I want to make changes to my current database so that:
- Japanese (and other non-Latin characters) can be saved and displayed
- There are no strings of mis-encoded special characters
- All of the characters I “corrected” in the past still display
Steps taken:
Since I’m having problems on a WordPress installation that I’ve used for nearly a decade, I identified the problem as one with character encoding. I recognize that old versions of WordPress, until v2.2, used the latin1 character set and the latin1_swedish_ci collation. When I look at my database in phpMyAdmin, I see that I have numerous tables that are still collated as latin1_swedish_ci.
I’ve tried, in numerous ways, to convert my database to UTF-8. Here’s what I’ve done:
- Use the UTF-8 Database Converter plugin
- Follow this guide to export the database and replace all instances of “latin1” with “UTF8”
- As described in the Codex, use a SQL script to convert tables and columns to blob, and then to UTF-8 text./li>
- As described in the Codex, use a SQL script to recast tables and columns to latin1 text, then to blob, and then to UTF-8 text
Each time I’ve done this, I’ve had the same result.
Actual results:
Upon creating the new database in mySQL, the following things happen:
- “Mis-encoded” special characters (na?ˉve) and Japanese text (?—¥???èa?) convert into legible strings (na?ve and 日本語)
- “Corrected” special characters cause data loss. An entry in the database that originally read, “Today, it is 37° and sunny” will lose all of its content from the first special character onward, becoming “Today, it is 37”. As far as I can tell, this affects all tables in the database, including wp_options.
- Question marks where I attempted to correct the mis-encoded Japanese remain question marks.
- When I create a new post containing special characters and Japanese, those characters are saved correctly to the database, and display without issue.
Help?
I know nothing about databases, but I really really want to believe there’s a way for me to correct my database so that Japanese saves and displays correctly… without also screwing up hundreds of entries that I enthusiastically “corrected” earlier, or configuration options.
Any advice at all that can point me in the right direction would be greatly appreciated.
Thank you.
- The topic ‘Trouble converting database containing special characters to UTF-8’ is closed to new replies.