November 7, 2006 at 4:08AM Spammers, go home!
Seeing as I’m getting over on 400 spam comment here a day, I decided to make my filtering a wee bit more aggressive. Rather than periodically checking over any new comments, it scans each new comment as they come. If it doesn’t get past the email address whitelist and it contains dodgy words and phrases, it gets marked as spam and, as far as the spammer is concerned, disappears into a black hole. The method used for scanning is also better than before. I should have done that originally.
Mind you, I’m still keeping the spam. As I said before, I want to analyse the stuff so that I can come up with a few other more subtle methods of catching spam. One idea that looks promising is running a very simple dictionary compression algorithm over each spam comment. Most of them have a high level of redundancy, so if it compresses particularly well--much better than regular text anyway--that’s a sure sign it’s spam.
One small thing I have to do is check to see if a posted comment is a duplicate of one posted within the last five minutes or so. If it is, it dies. I’m not doing this right now and I should, if only to catch those times when somebody accidentally clicks the submit button twice. Still, actually using the formkeys code would be the right way of coping with the latter case and it would catch some other spammers to boot.
Spammers will be happy to know that I have uncommented the code to send me an email whenever I get a comment that runs the gauntlet unscathed. Spammers will be disappointed to note that my contact form is now protected from spam and that spamming either form will give me more data to analyse and more chance of blocking future spam. Ha!
Update: This is going exceedingly well. In just six hours, it’s caught 128 bits of spam and I’ve updated the filters to catch [/url] and [/link]. I’d been a bit wary about adding those on account of them giving me possible false positives, but we’ll wait and see.
Another Update: It now checks the IPs, emails and homepages given against the ones already included in my spam corpus to catch the remaining few that still slip through. We’ll see what happens with that. I reckon it’s worth getting the spammers themselves to help with the weeding process. Of course, as I get more spam, this’ll start to scale a little worse, but for now I’ve no need to be too bothered.
No comments.