• It’s a pity, they can at least implement the default wordpress search engine for these characters. Instead they just shut down the entire system, not even working with exact words. Change my mind.

    ————————————————————————————–

    It seems there is a workaround solution for this problem.

    • This topic was modified 3 years, 6 months ago by elfigos.
    • This topic was modified 3 years, 6 months ago by elfigos.
    • This topic was modified 3 years, 6 months ago by elfigos.
Viewing 15 replies - 1 through 15 (of 18 total)
  • Plugin Author Mikko Saari

    (@msaari)

    Sure I’ll change your mind: Relevanssi actually has zero problems with Chinese or Korean characters if you use UTF-8. The problem with Chinese and Japanese is that the Relevanssi algorithm needs words, and Chinese and Japanese text lacks word separators. That’s the only problem. It’s a big one, sure, and since I don’t read Chinese or Japanese myself, I can’t fix it. Fortunately with Chinese there’s a jieba tool available that can split Chinese words so that Relevanssi works well. See here for more information and links: https://www.relevanssi.com/knowledge-base/relevanssi-and-languages/

    I think Korean language has spaces between words, in which case there should be no problems with Relevanssi and Korean.

    So it takes some effort, but it’s possible to set up Relevanssi so that it works well with Chinese.

    Thread Starter elfigos

    (@elfigos)

    What do you mean by “use UTF8”? Need to change something on database or just add the code in the html header?

    Plugin Author Mikko Saari

    (@msaari)

    Your database should use UTF-8; the HTML page encoding doesn’t matter for Relevanssi. That’s the default setting in WordPress, so if you haven’t modified it, it should already be correct.

    Thread Starter elfigos

    (@elfigos)

    My database and tables are all using utf8mb4_unicode_520_ci

    Plugin Author Mikko Saari

    (@msaari)

    Ok, so there should be no problems in indexing any characters at all. So is your site in Chinese, Japanese, Korean or a mixture of languages?

    If your site is in Korean, it should just work.

    If it’s in Chinese, you need the phpjieba tool on your site, then

    add_filter( 'relevanssi_remove_punctuation', 'rlv_use_jieba' );
    function rlv_use_jieba( $string ) {
        $string = jieba( $string, 1, 1500 );
        $string = @implode( ' ', $string );
        return $string;
    }

    will split the text in smaller parts so that Relevanssi can handle it better.

    If your site is in Japanese, then you need a similar tool for Japanese; I’m not aware of such tools, but I suppose they do exist.

    If you still have problems making the search work, it’s possible your theme is incompatible with Relevanssi. That happens, and I’m happy to help – but I’d much rather help you out in the support forums than in a 1-star review thread.

    Thread Starter elfigos

    (@elfigos)

    Hi, sorry for the delay. I changed the review sorry for that immediate bad review but I was a little upset about other stuff, I was really busy in the few past weeks and forgot this issue, now I installed Jieba but when I search something it gives “jieba” function not found.

    I transferred the repository to my own plugin folder, followed the commands below:

    git clone https://github.com/jonnywang/phpjieba.git
    cd phpjieba/cjieba
    make
    
    cd ..
    phpize
    ./configure
    make
    make install

    Added these lines of code on my plugin main file:

    ini_set('jieba.enable', 1);
    ini_set('jieba.dict_path', 'widgets/phpjieba-master/cjieba/dict');

    And added your code on functions.php:

    add_filter( 'relevanssi_remove_punctuation', 'rlv_use_jieba' );
    function rlv_use_jieba( $string ) {
        $string = jieba( $string, 1, 1500 );
        $string = @implode( ' ', $string );
        return $string;
    }
    • This reply was modified 3 years, 6 months ago by elfigos.
    • This reply was modified 3 years, 6 months ago by elfigos.
    • This reply was modified 3 years, 6 months ago by elfigos.
    Thread Starter elfigos

    (@elfigos)

    What seems strange is that you said it needs to split the words but Relevanssi gives me no results even if I search only one character.

    • This reply was modified 3 years, 6 months ago by elfigos.
    Plugin Author Mikko Saari

    (@msaari)

    I’m not able to help you with the jieba installation; I’ve never done that myself. I notice you’re setting the two jieba settings with ini_set(), do you have the extension=jieba.so line in your php.ini, too?

    The reason Relevanssi is not giving you one-character results is because by default Relevanssi is set up to ignore one-character searches. You can enable one-letter searching with

    add_filter( 'relevanssi_block_one_letter_searches', '__return_false' );

    (see: relevanssi_block_one_letter_searches)

    Thread Starter elfigos

    (@elfigos)

    It was all a misunderstanding, I kept searching with only one character….. that’s why I couldn’t find any result… I didn’t tried with multiple characters(not alphabet letters) until you told me relevanssi by default does not handle single character search that’s why I thought it doesn’t work at all with Chinese, Korean and Japanese characters…

    I added
    add_filter( 'relevanssi_block_one_letter_searches', '__return_false' );

    But it’s returning strange results even with alphabet words. It shows only 3 results for all searches with one letter and those results are not the right ones.

    Plugin Author Mikko Saari

    (@msaari)

    Relevanssi doesn’t match inside words by default. You can enable that with some code, you can find the code snippet from the Relevanssi help (click Help in the top right of Relevanssi settings page, the code can be found there). That should help with this, but yes, Relevanssi searching without word breaks is tricky and even though you can get Relevanssi to match inside words, the weights will be wonky. The jieba is necessary to get proper weightings.

    Thread Starter elfigos

    (@elfigos)

    Suppose my post is “asdf”, if I search “as” I can find it, but if I search “s” it gives me only 3 results which don’t even match with the search term. some other single characters are not working even if I set

    add_filter( 'relevanssi_block_one_letter_searches', '__return_false');

    • This reply was modified 3 years, 6 months ago by elfigos.
    • This reply was modified 3 years, 6 months ago by elfigos.
    • This reply was modified 3 years, 6 months ago by elfigos.
    Thread Starter elfigos

    (@elfigos)

    Another strange thing I found is that, I have 600 posts with the word “ABCDE”, if I search for “AB” it gives me 370 results, if I search for “ABC” it gives me 500 results. If I search for “A”, it gives me 3 results.

    • This reply was modified 3 years, 6 months ago by elfigos.
    Plugin Author Mikko Saari

    (@msaari)

    What’s your “Keyword matching” setting set to? That might explain something like this, if it’s not set to “Partial words”. If throttling is enabled, 500 would be the correct result. It’s hard to say in more detail what’s up with that, without knowing more about the circumstances.

    Thread Starter elfigos

    (@elfigos)

    Default operator: OR
    Keyword matching: Partial words
    relevanssi_fuzzy_query: not activated inside words
    Throttle searches: activated
    Minimum word length: 1

    I set content = ” before INDEXING keeping only the Title

    I have more than 600 Posts Title with the word “Pinyin”

    Search “p”: 0 results.
    Search “pi”: 488 results.
    Search “pin”: 500 results.

    My considerations:

    – It seems that Relevanssi searches inside words by default from 3 letters, not working well with 2 letters, and not working with 1 letter.

    – If I activate

    add_filter( 'relevanssi_fuzzy_query', 'rlv_partial_inside_words' );
    function rlv_partial_inside_words( $query ) {
    	return "(relevanssi.term LIKE '%#term#%')";
    }

    Search inside terms still not working for 1 letter

    Plugin Author Mikko Saari

    (@msaari)

    I took a closer look at this, and there’s indeed a mechanism in Relevanssi I had forgotten about: fuzzy matching is always disabled for single-letter searches. That generally makes sense, because a fuzzy single-letter search almost always returns all posts in the database.

    add_filter( 'relevanssi_term_where', function( $where ) {
        $where = preg_replace( "/relevanssi.term = '(.)'/", "relevanssi.term LIKE '%$1%'", $where );
        return $where;
    });

    This will enable fuzzy matching for single-letter searching.

Viewing 15 replies - 1 through 15 (of 18 total)
  • The topic ‘Not working with special characters like Chinese, Korean and Japanese’ is closed to new replies.