• Resolved b.l.k

    (@blk-1)


    Hello,

    in function relevanssi_index_doc(), strip_tags() is used at line 2353 :

    $contents = relevanssi_strip_invisibles($contents);
    $contents = strip_tags($contents);
    $contents = relevanssi_tokenize($contents);

    It makes an issue with some text. For example :

    <p>my text</p>
    <p>my second text</p>

    We obtain textmy in the list of indexed words.
    Isn’t it better to use preg_replace(), to replace tags by space and to get separated words to index ?

    Thank you for your answer.

Viewing 4 replies - 1 through 4 (of 4 total)
  • strip_tags() does not remove whitespace, but if the original text is

    <p>my text</p><p>my second text</p>

    then yes, there’s a problem. I suppose something like

    <[a-zA-Z\/][^>]*>

    would do the trick, without running into terrible problems.

    Thread Starter b.l.k

    (@blk-1)

    Thank you for your answer, i added :
    $contents = preg_replace('/<[a-zA-Z\/][^>]*>/', ' ', $contents);
    before
    $contents = strip_tags($contents);

    and now i got an indexation that looks pretty good !

    I could add :
    $pcoms = preg_replace('/<[a-zA-Z\/][^>]*>/', ' ', $pcoms);
    between :
    $pcoms = relevanssi_strip_invisibles($pcoms);
    and

    $pcoms = strip_tags($pcoms);
    $pcoms = relevanssi_tokenize($pcoms);

    to modify comments indexation ..

    Is it possible to add a fix in a next version ?

    Thank you again !

    Yeah, this is already on my to-do list for the next version. I’ll fix the comments as well.

    Thread Starter b.l.k

    (@blk-1)

    Thank you for your answers and your plugin ??

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘[Plugin: Relevanssi – A Better Search] strip_tags issue’ is closed to new replies.