SpamAssasin Rarely Misses

by J.D. Falk
Director of Product Strategy, Receiver Services

SpamAssassin is, by any measure, the most popular open source spam filtering software. It has won numerous awards, and has been incorporated into many commercial filtering appliances. On Tuesday, the SpamAssassin developers announced version 3.3.0, their first major update since 2007.

SpamAssassin was born in 2001, when Justin Mason (who is still involved in the project) rewrote & updated an earlier open-source filtering script. At present it primarily consists of a set of message tests of varying complexity, each analyzing portions of the headers or body and adding to or subtracting from the resulting spam score.

In general, any message with a SpamAssassin score of 5 or more is considered to be spam. It’s very rare for a single test to contribute enough to the score to be definitively spam or definitively not spam; instead, the effects of multiple tests are cumulative.

A few examples of tests in this version:

  • MPART_ALT_DIFF detects when the HTML and plain text parts of a message are substantially different.
  • DRUG_ED_CAPS catches messages which shout out the names of popular erectile dysfunction drugs.
  • FSL_GEO_ABUSE looks for any geocities.com URL in the message; the site was finally closed after many years of being a favorite with spammers, so now any link to it is invalid.
  • FH_DATE_PAST_20XX, intended to detect when a message’s Date: header is too far into the future; spammers do that to make sure their messages show up at the top of your inbox, assuming they’ll be reverse-sorted by date. This wasn’t updated in time for 2010, which caused some concern for a few days but was fixed quickly.
  • There are many more tests, though not all are well-documented.

The software is most commonly invoked by a process lying between the Message Transport Agent (MTA), which receives messages from other servers on the Internet, and the Message Delivery Agent (MDA), which places those messages into the appropriate mailbox file. Depending on how the system is configured, the message may be tagged by adding ***** SPAM ***** to the Subject: line, or by adding X-Spam: headers with details about which tests contributed to the score. Downstream processes in the MDA or the email client can use this information to place the message in an appropriate folder, or delete it outright.

Alternatively, some systems feed the message to SpamAssassin directly from the MTA during the initial SMTP transaction, which allows them to reject it with a 550 SMTP reply when the spam score is sufficiently high — usually 10 or more.

Mail system administrators can automatically download updated tests from the project, and have the ability to override any of those default settings. This allows the SpamAssassin developers to stay current in the face of ever-changing spamming techniques, and to remove or reduce the score of any tests which are inappropriately catching non-spam email. Administrators may choose to disable these automatic updates, but it’s unclear why they’d want to.

There are also a few new network tests which we’re particularly pleased with:

  • RCVD_IN_RP_SAFE detects messages sent from IP addresses on our Safe whitelist, and reduces the spam score by 2.
  • RCVD_IN_RP_CERTIFIED detects messages sent from IP addresses on our Certified whitelist, and reduces the spam score by 3. Every IP on Certified is also on Safe, so it’s actually reduced by 5.
  • RCVD_IN_RP_RNBL detects messages sent from IP addresses on our Reputation Network Blacklist. It only affects the score by 1.2-1.3 points at present, because messages sent by those IPs tend to also trigger lots of other tests.

RCVD_IN_RP_SAFE and RCVD_IN_RP_CERTIFIED replace old tests left over from the Bonded Sender and Habeas days, which was important because some members of the SpamAssassin community still believed that senders had to pay a bond to be on the Bonded Sender list, or that an X-Habeas: haiku header denoted approval by Habeas, neither of which has been true in many years.

We didn’t pay the Apache Foundation (which hosts & sponsors the SpamAssassin project) for these scores, or try to “sell” the developers on using it. We did talk about the products with them for quite a while: what the listing criteria is, our plans for the future, et cetera. Some of the developers & community members were friendly, others…not so much. In the end, it was SpamAssassin’s own testing process which convinced them to include these tests with these scores. The data spoke for itself, and they saw the value in it.

This is standard procedure for the SpamAssassin development team, with its deep roots in the open source community. Being open, anyone can participate in the discussions — which is both a blessing and a curse. Like any other debates about spam, conversations within the community occasionally get heated, and a few members are nearly ridiculous in their intractability. Yet when it comes to the product itself, the developers trust the data produced by their nightly testing framework. If the data shows a test is accurate and effective, they’ll include it. If not, they won’t — or it’ll be given a low score.

I could conclude this article by saying that we look forward to continuing our relationship with the SpamAssassin community, and that’s certainly true — but it’s not the whole story. I use SpamAssassin to protect my personal email, as do many others among the technical staff here at Return Path. We also use SpamAssassin to protect some of our corporate email systems; it’s that good. You’ll hear similar stories across the industry, and beyond. It is one of the few software packages to truly deserve to be called “ubiquitous.”

minute read

Popular stories

Products

BriteVerify

BriteVerify email verification ensures that an email address actually exists in real-time

DemandTools

The #1 global data quality tool used by thousands of Salesforce admins

Everest

Insights and deliverability guidance from the only all-in-one email marketing solution

GridBuddy Cloud

Transform how you interact with your data through the versatility of grids.

Return Path

World-class deliverability applications to optimize email marketing programs

Trust Assessments

A revolutionary new solution for assessing Salesforce data quality

Solutions

Validity for Email

Increase inbox placement and maximize subscriber reach with clean and actionable data

Validity for Data Management

Simplify data management with solutions that improve data quality and increase CRM adoption

Validity for Sales Productivity

Give your sales team back hours per day with tools designed to increase productivity and mitigate pipeline risks in real-time