• Problem: The plugin won’t auto-link keywords that have question marks or other punctuation in them (specifically, at the beginning or end of the keyword). This may also cause problems with keywords containing non-ASCII (i.e. Unicode) characters.

    Cause: get_kw_regex() in inc/front.php uses an incorrect pattern. The pattern used is:


    return sprintf('/(\b)(%s)(\b)/ui', implode('|', $keywords));

    \b is a word boundary where a word character and a non-word character are adjacent, but “word” characters only include [A-Za-z0-9_]. So a keyword ending in “?” will never create a word boundary, because it’s not a word character.

    Solution: Instead of using word boundaries, just use non-word characters:


    return sprintf('/(\W)(%s)(\W)/ui', implode('|', $keywords));

    I imagine this could be the cause of the reported Unicode problems too, since a keyword beginning or ending with a Unicode character would not create a word boundary either.

    https://www.ads-software.com/extend/plugins/seo-auto-linker/

Viewing 3 replies - 1 through 3 (of 3 total)
  • Plugin Author chrisguitarguy

    (@chrisguitarguy)

    A word boundary is a position in the subject string where the current character and the previous character do not both match \w or \W (i.e. one matches \w and the other matches \W), or the start or end of the string if the first or last character matches \w, respectively.

    And

    A “word” character is any letter or digit or the underscore character, that is, any character which can be part of a Perl “word”. The definition of letters and digits is controlled by PCRE’s character tables, and may vary if locale-specific matching is taking place. For example, in the “fr” (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

    It would seem that \b uses \w and \W? I also suspect that both \W and \b change when the unicode flag is specified?

    Source: https://php.net/manual/en/regexp.reference.escape.php

    Thread Starter bouncesquad

    (@bouncesquad)

    I didn’t test the Unicode theory, just punctuation. But \b, \B, \w, \W probably don’t change when /u is used. See point 6 on this page:

    https://www.exim.org/viewvc/pcre/code/trunk/doc/html/pcreunicode.html?view=co

    That’s the default behavior of PCRE even with Unicode enabled, but the PHP manual doesn’t say exactly what /u changes.

    Plugin Author chrisguitarguy

    (@chrisguitarguy)

    I just tested with a ? and other punctuation and it seems to work fine on my local server (nginx + php 5.3.10).

    I’m not sure. This unicode and regex stuff is the painful part of this plugin. :/

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: SEO Auto Linker] Regex pattern incorrect’ is closed to new replies.