• Resolved moody1337

    (@moody1337)


    Hey guys,

    somehow with using your plugin (for couple of years now) there is some spam, which is not blocked. As an example please see this screenshot: https://www.wihel.de/wp-content/uploads/2020/11/spam.png

    And somehow I’m not able to figure out some adjustments to block it on my end. I already tried some regex as custom pattern in addition to the host-pattern in anstispam_bee.php, but its not working. This kind of spam is only blocked when some of them are marked before manually (reason then is “local spam-db”)

    As far as I figured out, the all have one thing in common: the url exists only of one word, so a pattern like “if there is no tld given (AND the comment-body is also only one word) its mostly spam”

    Is someone there who can help?

    The page I need help with: [log in to see the link]

Viewing 6 replies - 1 through 6 (of 6 total)
  • Plugin Contributor Torsten Landsiedel

    (@zodiac1978)

    Hi @moody1337

    can you please report this spam to us via our form:
    https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1

    This is easier to work with than a screenshot.

    Thanks in advance!

    I already tried some regex as custom pattern in addition to the host-pattern in anstispam_bee.php, but its not working.

    I am here to help!

    Can you share, what you have at the moment? And which spam you want to detect with that? I will then try to reproduce this and get this pattern running.

    All the best,
    Torsten

    Plugin Contributor Torsten Landsiedel

    (@zodiac1978)

    The comments from the screenshot should be catched from the RegEx pattern already:
    https://github.com/pluginkollektiv/antispam-bee/blob/master/antispam_bee.php#L1560-L1562

    Wondering why this is not happening. Do you trust users with Gravatar and those comments/mails have a gravatar?

    Just to be sure: Do you have enabled “Use regular expressions”?

    All the best,
    Torsten

    Thread Starter moody1337

    (@moody1337)

    Hey Torsten,

    thanks for getting back that quick ?? I sent a few seconds ago the form filled out as good as I could (was not sure about the user-agent, hope it’s the “comment-agent”)

    Also, important to say: I’m not really a developer, it’s more like that I tried to read some parts of the plugins-code, stumbling through the support-forum, try to understand a bit and adapt it.

    Now to your questions:

    Regex I tried out

    I thought maybe I can extend the host-part in

    array(
    				'host' => '^(www\.)?fkbook\.co\.uk$|^(www\.)?nsru\.net$|^(www\.)?goo\.gl$|^(www\.)?bit\.ly$',
    			),

    for example with adding some of the following

    |^(http|https)(\:\/\/)(www)*(\w+){1}$

    |^(http|https)(\:\/\/)(www)*(\[a-z0-1]+){1}$

    Worth to mention: whenever I have to do something with regex, I use https://regexr.com/. To test it before implementing it, I tested it with

    1. https://test (should be matched to the regex-expression and so marked as spam)
    2. https://test (should be matched to the regex-expression and so marks as spam)
    3. https://www.test.de (should not be matched to the regex-expression and so not be marked as spam)
    4. https://test.de (should not be matched to the regex-expression and so not be marked as spam)

    Somehow the regex-expressions are working, but they also mark comments as spam when they are like example 3 and 4 – so in the end, they are not working or let’s say “they work too good” and mark more than they should.

    I also tried the same regex-expressions as an additional array, e.g.

    array(
    				'rawurl' => '^(http|https)(\:\/\/)(www)*(\w+){1}$',
    			),

    Your example at github

    If I’m not wrong, the curly brackets and the number within it means “exactly the amount of the number in the brackets” – which means: if the body exists only of one word with a length of 30 characters, its working, but with more or less than the 30 characters, it fails and so it’s not marked as spam.

    Gravatars

    I disabled Gravatars completly because of GDPR-reasons (you know, the stupid german data-protection law … was at least for me the easiest thing to just get rid of it)

    “Use regular expressions”

    Yes, I do (double-checked it, just to be sure) ??

    Some more thoughts from my side

    For me, it seems the common pattern is always the strange looking URL. They don’t have any subdomain mentioned and also no dot and a TLD at the end – could be a good startpoint for some solution for which I’m too dumb to figure the implementation out. Additionally or another point to start might be the comment-body – its always just one gibberish word, but at least quite long (but still not always the same length).

    Plugin Contributor Torsten Landsiedel

    (@zodiac1978)

    Hi @moody1337

    If I’m not wrong, the curly brackets and the number within it means “exactly the amount of the number in the brackets” – which means: if the body exists only of one word with a length of 30 characters, its working, but with more or less than the 30 characters, it fails and so it’s not marked as spam.

    Exactly. The screenshot is showing comments which match exactly this pattern. 10 characters for author and host and 30 characters in the body. This should have worked. I need to double check if the host is maybe adding a “http(s)://” per default.

    This pattern is already added to ASB since 2.9.2:
    https://github.com/pluginkollektiv/antispam-bee/pull/333

    I disabled Gravatars completly because of GDPR-reasons (you know, the stupid german data-protection law … was at least for me the easiest thing to just get rid of it)

    I’m German myself and I am very aware of the GDPR. If you want to, we can switch to German … ??

    Maybe the better way is to block the user agent (python-requests).

    See this PR for more information about this idea:
    https://github.com/pluginkollektiv/antispam-bee/pull/323

    whenever I have to do something with regex, I use https://regexr.com/.

    I prefer https://regex101.com/ – but thanks for the link. Testing Regex on pages like this is best practice for complex thing like regex ??

    All the best,
    Torsten

    Thread Starter moody1337

    (@moody1337)

    Hey Torsten,

    yeah, already thought about switching the language, but maybe there are also non-germans outside with the same problem and want to follow our small discussion ??

    You are right, I never counted the characters and just looked at same. But as wordpress does not use a monospace-font it was just … dumb from my side.

    Good idea with the user-agent! I changed the server-config and at least my tests looked good. Will see if the spam-comments are stopped by it.

    Thanks for your help and of course the great discussion ??

    Plugin Contributor Torsten Landsiedel

    (@zodiac1978)

    You’re welcome! ??

    I will mark this as resolved for now. If you encounter any problems, please ping me again here or open a new thread.

    Thanks!

    All the best,
    Torsten

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Spam not detected / How to block?’ is closed to new replies.