Hey Torsten,
thanks for getting back that quick ?? I sent a few seconds ago the form filled out as good as I could (was not sure about the user-agent, hope it’s the “comment-agent”)
Also, important to say: I’m not really a developer, it’s more like that I tried to read some parts of the plugins-code, stumbling through the support-forum, try to understand a bit and adapt it.
Now to your questions:
Regex I tried out
I thought maybe I can extend the host-part in
array(
'host' => '^(www\.)?fkbook\.co\.uk$|^(www\.)?nsru\.net$|^(www\.)?goo\.gl$|^(www\.)?bit\.ly$',
),
for example with adding some of the following
|^(http|https)(\:\/\/)(www)*(\w+){1}$
|^(http|https)(\:\/\/)(www)*(\[a-z0-1]+){1}$
Worth to mention: whenever I have to do something with regex, I use https://regexr.com/. To test it before implementing it, I tested it with
1. https://test (should be matched to the regex-expression and so marked as spam)
2. https://test (should be matched to the regex-expression and so marks as spam)
3. https://www.test.de (should not be matched to the regex-expression and so not be marked as spam)
4. https://test.de (should not be matched to the regex-expression and so not be marked as spam)
Somehow the regex-expressions are working, but they also mark comments as spam when they are like example 3 and 4 – so in the end, they are not working or let’s say “they work too good” and mark more than they should.
I also tried the same regex-expressions as an additional array, e.g.
array(
'rawurl' => '^(http|https)(\:\/\/)(www)*(\w+){1}$',
),
Your example at github
If I’m not wrong, the curly brackets and the number within it means “exactly the amount of the number in the brackets” – which means: if the body exists only of one word with a length of 30 characters, its working, but with more or less than the 30 characters, it fails and so it’s not marked as spam.
Gravatars
I disabled Gravatars completly because of GDPR-reasons (you know, the stupid german data-protection law … was at least for me the easiest thing to just get rid of it)
“Use regular expressions”
Yes, I do (double-checked it, just to be sure) ??
Some more thoughts from my side
For me, it seems the common pattern is always the strange looking URL. They don’t have any subdomain mentioned and also no dot and a TLD at the end – could be a good startpoint for some solution for which I’m too dumb to figure the implementation out. Additionally or another point to start might be the comment-body – its always just one gibberish word, but at least quite long (but still not always the same length).