1. Why Filter Mail During the SMTP Transaction?

1.1. Status Quo

If you receive spam, raise your hands. Keep them up.

If you receive computer virii or other malware, raise your hands too.

If you receive bogus Delivery Status Notifications (DSNs), such as Message Undeliverable, Virus found, Please confirm delivery, etc, related to messages you never sent, raise your hands as well. This is known as Collateral Spam.

This last form is particularly troublesome, because it is harder to weed out than standard spam or malware, and because such messages can be quite confusing to recipients who do not possess godly skills in parsing message headers. In the case of virus warnings, this often causes unnecessary concern on the recipient's end; more generally, a common tendency will be to ignore all such messages, thereby missing out on legitimate DSNs.

Finally, I want those of you who have lost legitimate mail into a big black hole - due to misclassification by spam or virus scanners - to lift your feet.

If you were standing before and are still standing, I suggest that you may not be fully aware of what is happening to your mail. If you have been doing any type of spam filtering, even by manually moving mails to the trash can in your mail reader, let alone by experimenting with primitive filtering techniques such as DNS blacklists (SpamHaus, SPEWS, SORBS...), chances are that you have lost some valid mail.

1.2. The Cause

Spam, just like many other artifacts of greed, is a social disease. Call it affluenza, or whatever you like; lower life forms seek to destroy a larger ecosystem, and if successful, will actually end up ruining their own habitat in the end.

Larger social issues and philosophy aside: You - the mail system administrator - face the very concrete and real life dilemma of finding a way to deal with all this junk.

As it turns out, there are some limitations with the conventional way that mail is being processed and delegated by the various components of mail transport and delivery software. In a traditional setup, one or more Mail Exchanger(s) accept most or all incoming mail deliveries to addresses within a domain. Often, they then forward the mail to one or more internal machines for further processing, and/or delivery to the user's mailboxes. If any of these servers discovers that it is unable to perform the requested delivery or function, it generates and returns a DSN back to the sender address in the original mail.

As organizations started deploying spam and virus scanners, they often found that the path of least resistance was to work these into the message delivery path, as mail is transferred from the incoming Mail Exchanger(s) to internal delivery hosts and/or software. For instance, a common way filter out spam is by routing the mail through SpamAssassin or other software before it is delivered to a user's mailbox, and/or rely on spam filtering capabilities in the user's Mail User Agent.

Options for dealing with mail that is classified as spam or virus at this point are limited:

  • You can return a Delivery Status Notification back to the sender. The problem is that nearly all spam and e-mail borne virii are delivered with faked sender addresses. If you return this mail, it will invariably go to innocent third parties -- perhaps warning a grandmother in Sweden, who uses Mac OS X and does not know much about computers, that she is infected by the Blaster worm. In other words, you will be generating Collateral Spam.

  • You can drop the message into the bit bucket, without sending any notification back to the sender. This is an even bigger problem in the case of False Positives, because neither the sender nor the receiver will ever know what happened to the message (or in the receiver's case, that it ever existed).

  • Depending on how your users access their mail (for instance, if they access it via the IMAP protocol or use a web-based mail reader, but not if they retreive it over POP-3), you may be able to file it into a separate junk folder for them -- perhaps as an option in their account settings.

    This may be the best of these three options. Even so, the messages may remain unseen for some time, or simply overlooked as the receiver more-or-less periodically scans through and deletes mail in their Junk folder.

1.3. The Solution

As you would have guessed by now, the One True solution to this problem is to do spam and virus filtering during the SMTP dialogue from the remote host, as the mail is being received by the inbound mail exchanger for your domain. This way, if the mail turns out to be undesirable, you can issue a SMTP reject response rather than face the dilemma described above. As a result:

  • You will be able to stop the delivery of most junk mail early in the SMTP transaction, before the actual message data has been received, thus saving you both network bandwidth and CPU processing.

  • You will be able to deploy some spam filtering techniques that are not possible later, such as SMTP transaction delays and Greylisting.

  • You will be able to notify the sender in case of a delivery failure (e.g. due to an invalid recipient address) without directly generating Collateral Spam

    We will discuss how you can avoid causing collateral spam indirectly as a result of rejecting mail forwarded from trusted sources, such as mailing list servers or mail accounts on other sites [1].

  • You will be able to protect yourself against collateral spam from others (such as bogus You have a virus messages from anti-virus software).

OK, you can lower your hands now. If you were standing, and your feet disappeared from under you, you can now also stand up again.



[1] Untrusted third party hosts may still generate collateral spam if you reject the mail. However, unless that host is an Open Proxy or Open Relay, it presumably delivers mail only from legitimate senders, whose addresses are valid. If it is an Open Proxy or SMTP Relay - well, it is better that you reject the mail and let it freeze in their outgoing mail queue than letting it freeze in yours. Eventually, this ought to give the owners of that server a clue.