• How can the number of spam I get in a day exceed the number of visits my wp-stats record for that day? Don’t spammers have to “visit” in order to leave their crap. I have just wondered this, it’s not like my site is broken.

    If someone can offer a lucid explanation (and don’t be afraid to get technical) I would love to know it. Thank you.

Viewing 4 replies - 1 through 4 (of 4 total)
  • Spammers don’t have to visit your site every time they post spam. They prefer not to.

    Every standard WordPress installation uses the same script with the same parameters to accept comments. All the spammer needs to know is your blog’s URL.

    There are plugins that change this. For example, I wrote one called Quiz that adds a labeled field to the comment form. The label tells the commenter what they must enter in the field to have their comment accepted. Spammers who are just using the standard parameters won’t be able to post comments. Even if it’s just “What is 2+2?” this stops the vast majority of automated spam.

    Thread Starter tixrus

    (@tixrus)

    Hi Andy, Thanks for your answer. I realize that spammers don’t open up a browser window, but even an automated script has got to touch my server to post spam to the database. So I guess that changes my question: What exactly qualifies as a “visit”? I may have to get that plugin. Akismet flags and deletes most of the spam but I’d just as soon they are blocked from posting it in the first place.

    It’s time to get technical because the answer involves both HTTP and Javascript. A typical visit begins when their browser sends an HTTP GET request to your blog’s server. This request communicates the desired URL and certain details about the visitor: an IP address, a description of the browser, and possible the URL of the page where they clicked a link to the desired URL.

    Some stats packages count these GET requests but there is a ton of noise in this raw data. A typical WordPress stats plugin (such as Jetpack) adds a snippet of Javascript to the HTML instructing the browser to make an additional GET request. This allows the stats package to collect more information about the visiting browser and to filter out most of the “visits” from non-humans; most bots don’t process Javascript.

    A comment is submitted by a different kind of request: an HTTP POST. (We use capital letters because that’s what the protocol specifies. It’s not an acronym.) A POST request is just like a GET except the POST carries additional data which is intended to be saved by the server. In WordPress, that data includes the commenter’s name, email, URL, and comment text. (If you are using Quiz, it also includes the content of the quiz answer field. Other plugins may add other fields.)

    A typical, human commenter would first GET your post, then spend time reading it, then POST a comment. The URL used to GET your post might be anything but the URL of the comment submission script is the same for all WordPress blogs: it’s your blog URL plus a standard file name. The required parameters for the POST data are also standardized throughout the WordPress world.

    Thus a spammer can write just one spam script and feed it a list of blog URLs and the script will be able to POST comments without first GETting the articles they intend to spam. (This is an oversimplification. For a comment to be accepted it requires a valid post_id, a number which might not be easy to guess. Therefore some spam scripts will first crawl the blog with GETs to discover valid post_ids. Others might simply guess, or take a list of IDs as part of the input along with the blog URLs.)

    Certain plugins and hacks alter the comment script URL as part of an anti-spam strategy. This foils only the most simplistic spammers since the script URL can be discovered with just one GET. The URL is included in every page that contains a comment form.

    Quiz works by adding a requirement to the POST data. You can set a different question and answer for each post. This pretty much limits spam to humans being paid to submit comments manually since your questions should not be answerable by any script that a spammer would find affordable. Spammers are still able to send their POST requests but Quiz blocks their data from entering your database. The immediate benefit to you is that you spend less time processing spam.

    There are better, more clever plugins that work much harder to foil spammers. Bad Behavior is the name of a famous one that I recall. If I remember correctly it can block spam GETs as well as POSTs. The additional benefit here is that your server spends less time processing spam so that it can be more responsive to your real audience.

    Thread Starter tixrus

    (@tixrus)

    Thank you that’s the answer I was looking for. The key is that the wp stats needs a javascript to run in order to count a HTTP request and auto POST “visits” don’t have that so they don’t count. I would have to look at something more primitive like awstats to see how much bandwidth was being wasted on spam.

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Spam vs. Stats.’ is closed to new replies.