• Resolved Sabinooo

    (@sabinooo)


    Hello,

    One of the blogs I host has an issue, however I’m not asking for support with that issue, but with its follow-up. Please allow me elaborate ??

    But to sum it up real quick, the final question is to know if it’s OK to convert an entire mixed-encodings database to UTF-8 (more precisely UTF-8mb4), or not.

    *

    The context: I moved the sites I host from a dedi (obsolete Debian, LAMP, mysql database engine) to a new dedi (latest stable Debian buster, LAMP, but a new database engine, mariadb).

    At the moment, the encoding of the problem blog’s tables is a huge mixed bag, mostly latin1_swedish_ci, with a few utf8_general_ci, here’s an ugly screenshot: https://imgur.com/a/ziyBLQ2

    The blog’s issue (for which I do NOT ask for support): even though the database of the problem blog was properly exported and re-imported, retaining its encoding and collation properties, there were 2 encoding problems.
    First the typical problem of special chars (such as apostrophes, etc) becoming garbage, and second of “really” special chars (such as Japanese letters) becoming “?” question marks.
    This second issue proved it was not just a problem of broken import: a manual UPDATE query in PhpMyAdin would return “#1366 – Incorrect string value” followed by \xVALUES\ series.

    The first issue is very close to being fixed (rather than an endless series of search and replace, a whole .Sql file edit in Notepad++ will do its magic, open .sql, Encoding > Convert to ANSI, Encoding > Encode in UTF-8, even though it’s annoying, it *works* but no idea why. Still, I mention the method in case people come here from a web search in the distant future because they have the same problem.)

    The second issue can, as for it, can be fixed with a phpmyadmin query:
    ALTER TABLE
    wp_posts
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

    After the posts table is converted to UTF-8 (mb4 option, the internet says it’s even better), every question mark is back to looking like the special chars it’s meant to be, such as the Japanese letters, and you can add those characters without issues.

    What I’m asking for help with: I am tempted to convert the WHOLE database to UTF-8mb4, and call it a day, to avoid all future potential issues.

    But, please, would you know if it’s safe to do so, and bid old latin1 encoding farewell?

    Or should I only convert the strict obligatory minimum number of tables, as there would be hidden problems I’m not aware of?

    *

    Sorry for the long post, thanks if you have an idea and can share it!

    • This topic was modified 4 years, 7 months ago by Sabinooo.
    • This topic was modified 4 years, 7 months ago by Sabinooo.
Viewing 4 replies - 1 through 4 (of 4 total)
  • Moderator Steven Stern (sterndata)

    (@sterndata)

    Volunteer Forum Moderator

    Well, here’s what I’d do.

    1. Shut down the website.
    2. Use mysqldump to backup the database.
    3. Execute the query to convert everything.
    4. Start the site and take a look.
    5a. If it’s OK, pat myself on the back and have lunch.
    5b. If not, use the mysql command to import the dump, putting things back to the way they were.

    Thread Starter Sabinooo

    (@sabinooo)

    Thanks for the feedback Steve!

    I’m not alien to similar methods, namely cloning the database, doing all kinds of black magic (AKA SQL queries that I only half understand, found elsewhere on the internet) on the clone, and briefly updating wp-config to use the cloned database instead of the official one, to see what it looks like.

    But it’s a very active blog, with several updates each day, and there’s the risk to wrongly judge the UTF-8mb4 conversion was fine, only to discover weeks later there IS a problem with some elements, and then there’ll be only the grim choice of rollbacking and losing weeks of activity.

    Ideally, I’d rather avoid a rollback because I didn’t triple-check and ask for feedback on the wordpress forums ??

    • This reply was modified 4 years, 7 months ago by Sabinooo.
    Moderator Samuel Wood (Otto)

    (@otto42)

    www.ads-software.com Admin

    Yes, everything should be utf8mb4.

    Advice: Set up a brand new test install. Look at what it now makes by default. Convert your old DB to be the same as the new one.

    Thread Starter Sabinooo

    (@sabinooo)

    Thanks Otto!

    I’ve just tried that, and that new install had everything in the mb4 variant. Which is a source of relief, I guess, that’s the new standard for the setup I have, so it should go well.

    Tomorrow’s the leap of faith, here’s to hoping, huhu.

    Again, thank you for the forum helpers, all those years, you guys are great!

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Is it OK to convert an entire blog database to UTF-8?’ is closed to new replies.