• Resolved WPChina

    (@wordpresschina)


    So I contacted the WP RSS a few weeks ago about a problem I had seen for a while using their plugin (I am a paid user of their product addons). The problem exists on EVERYONE’s installation of the plugin but only affects you if you aggregate non-Western charsets into your WordPress site. Fixing it doesn’t seem like a top priority for WP RSS, so I had to fix it myself and share with you the solution.

    The problem is that the plugin takes all post-content and makes it into ASCII text. This of course doesn’t affect you if you use English, because ASCII English and UTF-8 English are the same. But it does affect you if you use maybe gb2312 or big5 or Korean or Thai or a Sanskrit-based language, etc. It affects you negatively because the WP RSS plugin *INCORRECTLY* changes the UTF-8 content from the original source into ASCII. And then within your MySQL in the post-content field it saves it all as ASCII test.

    When WordPress then calls that field and displays it, most likely it will display fine on the frontend, because most modern browsers know how to translate ASCII into human-readbale text. So WP RSS may aggregate Chinese from an original source, save it as ASCII, and then display within a browser as normal Chinese again. But that is all cosmetic and the backend is still broken.

    But there is a problem… because the database itself saves the content as ASCII, most functions that trim or try to read that post content will not be able to do anything. So, for example, if you try to trim the excerpt length with a function to 10 characters, the function won’t be able to understand what a “character” looks like because it’s all in ASCII.

    So the fix is actually quite simple within WP RSS. I looked at the code and I can see where the changes can easily be made to make this a plugin that can used globally rather than just in Europe and North America (maybe I should make a fork). But instead of changing the plugin, I just made this function:

    function wprss_ftp_dumb_converter_post_content_callback( $string, $arg1 ) {
        $string=html_entity_decode($string, ENT_QUOTES, 'UTF-8');
        return $string;
    }
    add_filter( 'wprss_ftp_dumb_converter_post_content', 'wprss_ftp_dumb_converter_post_content_callback', 10, 3 );

    So that should fix the problem for you. I hope WP RSS developers c an integrate this fix into the plugin itself though.

    • This topic was modified 6 years, 9 months ago by WPChina.
Viewing 2 replies - 1 through 2 (of 2 total)
  • Hi @wordpresschina

    I understand you had some trouble with the data charset after importing Chinese characters into your WordPress site. Our team has thoroughly reviewed your code and your recommendations and at this time, our team has identified various issues with it.

    For this reason, we’d like to address the following so that other users of the plugin are informed as well before attempting your proposed solution and add any code to their site.

    First off, we’d like to point out that the “non-Western” languages are not the only non-ASCII languages. For example, the Polish and Norwegian (both European) language characters cannot be saved as ASCII, and require UTF instead. Therefore, it is not only the Korean or Thai or other Asian languages that are affected, but many others too including European languages. If there was a problem with ASCII VS UTF, this problem would manifest with many of our existing customers.

    The “database itself” does not save anything as ASCII. WordPress creates its tables using the UTF-8 charset by default, and makes every effort to preserve the characters via its functions. Feed to Post uses WordPress functionality to read and write data, and therefore is also UTF-8 compatible. So if a particular environment saves data in ASCII, chances are that the environment is somehow broken.

    WP RSS Aggregator intentionally supports multi-byte strings and UTF-8, for example when trimming or counting words. Even if the PHP environment does not have special functions for dealing with multi-byte strings, WP RSS Aggregator falls back to using special regular expressions to still handle them correctly. if you have found a case in the code where the appropriate multi-byte functions should be used but are not, please let us know via a separate support request.

    For questions related to any of our premium add-ons, be it pre-sales or other, please only contact us via a premium support ticket on this link: https://www.wprssaggregator.com/contact/

    That being said, the proposed coded solution you have provided seems to add a handler for the hook wprss_ftp_dumb_converter_post_content, but this hook does not exist in our codebase. In fact, the Feed to Post add-on does not contain that string, or the string “dumb”, anywhere, so adding the code of the presented solution will not have any effect on a normal installation.

    A UTF-8 string cannot be converted to ASCII and then back. ASCII characters consist of 1 byte, while UTF characters may have up to 6 bytes (that’s why they’re called “multibyte”). So, unless the original string is made of ASCII characters, converting it to ASCII would cause it to lose information, and possibly become corrupted. Data lost cannot be restored again. So, even if the proposed solution had any effect, it would not solve the problem.

    We do not advise anyone at this point, whether it be a customer of WP RSS Aggregator or otherwise to add the following proposed solution as we cannot guarantee that this code won’t cause irreparable damage to a WordPress site.

    cc: @modlook could you please make sure the code is not visible until it is verified that it is a viable solution and it does not cause any type of data loss.

    Thread Starter WPChina

    (@wordpresschina)

    No problem I will contact you at your support site instead. I tested it on wp-rss-feed-to-post/includes/wprss-ftp-converter.php and you can see it on line 304.

    WPRSS_FTP_Utils::log( 'Applying post_content filter...', FUNCTION, WPRSS_FTP_Utils::LOG_LEVEL_SYSTEM );
    $post_content	= apply_filters( 'wprss_ftp_converter_post_content',	$item->get_content(), $source );

    Sorry the “dumb” was a placeholder I use for testing and forgot to find/replace before adding here. So the function should actually be wprss_ftp_converter_post_content and that is part of an addon to your free plugin, so best to contact you on your site.

    • This reply was modified 6 years, 9 months ago by WPChina. Reason: edited path to file
Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Fix for this plugin messing up non-Western languages’ is closed to new replies.