Mga tip sa pangingisda ng jili withdrawal,Irich bingo jili register app.Recharge Every day and Get Bonus up-to 50%!

Resolved Southwest
(@southwest)

9 years, 9 months ago
Summary:

I have a WordPress database that is filled with special characters encoded in different ways, and this is interfering with my attempts to convert it all to UTF-8.

Technical Specs:

WordPress version: 4.1.1

Hosted at NearlyFreeSpeech

Backstory:

I have a WordPress blog, currently running WordPress 4.1.1. I’ve had the same blog for close to ten years. A while ago–unfortunately, I can’t remember when–I noticed that old posts were filled with weird character strings wherever I had a special character. For instance, I saw na?ˉve instead of na?ve. I corrected these glitches when I saw them, opening the post in the editor and replacing ?ˉ with ?.

When I saved the post, the new special character was displaying instead of the original garbled mess. I was happy.

I had some old posts that contained Japanese text, and the Japanese text was goofy here, too. Instead of 日本語, it would display ?—￥???èa?. Only, when I tried to correct that in the post editor, the Japanese characters I entered would be replaced entirely with question marks upon hitting “save” or “publish”.

I didn’t replace all of those garbled strings, though; I still have some posts that show na?ˉve instead of na?ve (for example). This means that my blog currently contains:
- mis-encoded special characters (na?ˉve)
- “proper” special characters (na?ve)
- mis-encoded Japanese (?—￥???èa?)
- question marks where I attempted to correct the mis-encoded Japanese (???)
What I want:

I want to make changes to my current database so that:
1. Japanese (and other non-Latin characters) can be saved and displayed
2. There are no strings of mis-encoded special characters
3. All of the characters I “corrected” in the past still display
Steps taken:

Since I’m having problems on a WordPress installation that I’ve used for nearly a decade, I identified the problem as one with character encoding. I recognize that old versions of WordPress, until v2.2, used the latin1 character set and the latin1_swedish_ci collation. When I look at my database in phpMyAdmin, I see that I have numerous tables that are still collated as latin1_swedish_ci.

I’ve tried, in numerous ways, to convert my database to UTF-8. Here’s what I’ve done:
1. Use the UTF-8 Database Converter plugin
2. Follow this guide to export the database and replace all instances of “latin1” with “UTF8”
3. As described in the Codex, use a SQL script to convert tables and columns to blob, and then to UTF-8 text./li>
4. As described in the Codex, use a SQL script to recast tables and columns to latin1 text, then to blob, and then to UTF-8 text
Each time I’ve done this, I’ve had the same result.

Actual results:

Upon creating the new database in mySQL, the following things happen:
- “Mis-encoded” special characters (na?ˉve) and Japanese text (?—￥???èa?) convert into legible strings (na?ve and 日本語)
- “Corrected” special characters cause data loss. An entry in the database that originally read, “Today, it is 37° and sunny” will lose all of its content from the first special character onward, becoming “Today, it is 37”. As far as I can tell, this affects all tables in the database, including wp_options.
- Question marks where I attempted to correct the mis-encoded Japanese remain question marks.
- When I create a new post containing special characters and Japanese, those characters are saved correctly to the database, and display without issue.
Help?

I know nothing about databases, but I really really want to believe there’s a way for me to correct my database so that Japanese saves and displays correctly… without also screwing up hundreds of entries that I enthusiastically “corrected” earlier, or configuration options.

Any advice at all that can point me in the right direction would be greatly appreciated.

Thank you.

Viewing 1 replies (of 1 total)

Thread Starter Southwest
(@southwest)

9 years, 9 months ago
I’ve fixed this, and want to share what I did.

The issue was one of mixed encoding. Some fields contained data encoded properly as UTF-8; others contained data encoded as something else, probably ISO-8859-1. When imported to a new UTF-8 database, this was causing truncation.

My steps to solve this:
1. Copy the original database, wordpress, to a new database, wordpress2. Make sure the collation of wordpress2 is set to UTF-8.
2. Follow any one of the steps above to convert the tables and columns of wordpress2 to UTF-8. This will cause truncation.
3. For each row on wordpress containing entries where the data when converted to UTF-8 was not the same as the data when converted to ASCII, update the corresponding row in wordpress2 with the wordpress data, converted to UTF-8. A sample script is below.
The script:

UPDATE wordpress3.wp_options wp3 INNER JOIN wordpress.wp_options wp ON (wp.option_id = wp3.option_id AND convert(wp.option_value using utf8) != convert(wp.option_value using ascii)) SET wp3.option_value = convert(wp.option_value using utf8);

A more knowledgeable friend wrote a series of scripts for me that queried information_schema to locate all the columns containing entries where convert(value using utf8) != convert(value using ascii), and to then generate versions of the script above for them.

Results:
It worked! On my new database, I can save Japanese without it turning into question marks (because the character set has successfully been set to UTF-8), and all the mis-encoded fields that were causing data truncation have been fixed.

There are some posts that still contain misencoded characters, but since I can find almost all of those strings by searching for a, ?, ?, or ?, I can just go in and manually replace them. I’m not troubled by that.

Viewing 1 replies (of 1 total)

The topic ‘Trouble converting database containing special characters to UTF-8’ is closed to new replies.

Trouble converting database containing special characters to UTF-8

Tags

Topics

Topics with no replies

Non-support topics

Resolved topics

Unresolved topics

All topics