• Resolved Duvenhage

    (@duvenhage)


    Hi.

    I know this thread isn’t quite about WordPress itself and is more about PHP coding, but it was my last hope to post in here.

    I’m using the function move_uploaded_file so that users can upload files into the system. But the problem is when they upload files that have Persian characters in their names, the uploaded file will have totally random unreadable characters in its name.

    I searched through many forums and tried many functions like iconv, quoted_printable_encode / quoted_printable_decode, mb_convert_encoding, but none worked. Because I believe they come into action right before move_uploaded_file function, so the output will always be the same.

    Many people said it was due to some problem with the function itself, which was later fixed in PHP 7.

    But the thing is, WordPress is perfectly doing the job even on lower versions, when PHP 7 was not even born. On WordPress v3.8 for instance, when you upload a file with Persian name, the original name with its encoding will be kept and used.

    So I decided to come around here asking for some help or hints to know what trick WordPress is using.

    My question is very simple, but the answer is not, I perfectly know. But even small hints will be highly appreciated.

    Just to mention that, I have very basic knowledge of PHP and absolute zero knowledge of WordPress coding.

    Thanks,

Viewing 3 replies - 1 through 3 (of 3 total)
  • You can encode /decode the file name with base64:
    1. When uploading the file, apply base64_encode() to name (without extension).

    $name ='??? ???';
    $ext ='jpg';
    $fname = base64_encode($name) .'.'. $ext;  //filename, base64 and extension
    
    echo $fname;

    2. When fetching the file from server, apply base64_decode() to the filename (after extracting the extension) to get the correct name.

    $filename ='2YbZiNixINi12YTYrQ==.jpg';  //name encoded with base64, and its extension
    $name_ext = explode('.', $filename, 2);  //array with [name, extension]
    $name = base64_decode($name_ext[0]) .'.'. $name_ext[1];  //decode the name only
    
    echo $name;  // ??? ???.jpg
    Moderator bcworkz

    (@bcworkz)

    WordPress does nothing special or unique. What it does is be very consistent in specifying UTF-8 in all possible ways. This includes PHP locale, content type header, charset, etc. A page starting out with:

    <?php
       setlocale( LC_ALL, "fa_IR.UTF-8");
       header('content-type: text/html; charset=utf-8');
    ?>
    <html lang="fa_IR">
    <head>
       <meta charset="UTF-8">

    should send uploads with UTF-8 filenames intact and handled as such in PHP. Note that the language_country code “fa_IR” is only an example, it can be any valid language_country code. The WP consistency with UTF-8 goes beyond the factors mentioned here, there are several places in DB related data that need to be specified as UTF-8 as well, but they are unrelated to file uploads.

    FWIW, the “random unreadable” characters for Persian that you see are URL encoded hexadecimal representations of Persian. They are not random. You could get them back to UTF-8 characters by running the string through urldecode(). Naturally it’s better to get everything set to use UTF-8 directly instead of alternative encodings of any other sort.

    Thread Starter Duvenhage

    (@duvenhage)

    @mateusgetulio and @bcworkz :

    Thanks alot for both your replies and help. I really appreciate it.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘UTF-8 characters and upload feature’ is closed to new replies.