PDF Spam
June 20th, 2007The spam filtering setup on our server is pretty good - SpamAssassin with Bayesian filtering and the FuzzyOCR Plugin which I installed to deal with the rise of image-based spam last year. Still, a few email addresses that route to me are very public, and most days one or two spam messages get through the filters.
This morning I noticed a new phenomenon in my inbox. I almost moved it across into my “missed spam” folder without giving it a second thought (we train our filters with missed spam to improve the Bayesian analysis), but something caught my eye:
“That’s odd,” I thought, so I opened the pdf. (Note, in general unless you know what you’re doing, it’s a really bad idea to open attachments if you don’t know the sender or weren’t expecting something from them - it could be a virus.)
That’s right, it’s spam, in a pdf file. While spamassassin does a great job of analysing text, and even images using FuzzyOCR, no analysis is done of pdf attachments, so this one slipped through the net. (I’ve had seven copies of this so far today.)
What next? Well, if this type of spam continues (and there’s no reason to think it won’t) I expect we’ll see a pdf scanning plugin for SpamAssassin before too long. After that gains traction the spammers will undoubtedly adapt again with some new trick to avoid the filters. Rinse, and repeat.
The arms race continues…


