How Bayesian Spam Filters Work

Those of us plagued by the onslaught of tens if not hundreds of unwanted emails greeting us as we open up our email accounts have some hope for respite in the form of Bayesian spam filters. For years spammers have been able to remain one step ahead of spam blockers simply because of their creativity and ability to adjust and evade blocking each time a new spam filter was developed. As a result, antispam software developers were certain of the task before them; to develop software that could continually learn from the new and creative techniques of spammers and as a result never fall behind in the spam blocking game.

Only recently, has such a solution been developed in the form of Bayesian filters. The Bayesian statistical method, upon which the filters are based, works on the premise of dividing emails into categories. The software takes a log of the emails which you choose to open and those you simply delete. All the while, it monitors the characteristics of both those emails you opened and those you did not. Over time, it learns from these aggregate figures. It will recognize certain words that appear frequently in those emails which you constantly ignore. The software will then be more prone to categorizing emails with a high frequency of that particular word as spam.

But lest you be concerned that not opening a few emails from your Aunt Sue will suddenly cause all emails with the word ‘Sue’ to be categorized as spam you should know that the Bayesian filters work in the aggregate. This means, that while your decisions on which emails to open and which to not will affect the algorithm, so will the activities of thousands if not tens of thousands of other users. Spread across such a wide body of users, and aggregating data over such a period, there is only a minimal danger of false labeling. Rather, what you get is a very accurate long term tool to block spam. Unfortunately, those same characteristics which prevent false blocking also limit the Bayesian spam filter from blocking the front wave of a new spamming technique. So, often the effect is a few days or weeks of a new technique in spamming sneaking through the cracks until it is worked out into the algorithm.

By the same token, however, one of the great benefits of Bayesian spam filters is that they can be individualized. If you receive a disproportionate amount of spam based upon your online interests, you can actually modify a Bayesian spam blocker to treat certain words that are particularly spammy to you as such. That is to say, that while unprompted a Bayesian filter will be cautious to avoid over-blocking, when prompted by the user, the software can be made to block any specific type of spam emails.

Only recently has the technology become commercially available. Currently it is offered in a limited number of anti spam software programs which can be purchased and which scans each piece of email before it is opened. The other form is that which is actually embedded into the mail server software itself, meaning that the customer’s emails are already scanned and classified even before he or she opens up their email provider.


Leave a Reply

Your email address will not be published. Required fields are marked *