This article includes a list of general references, but it lacks sufficient corresponding inline citations .(March 2013) |
Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). [1] The program is able to learn from the user's classifications and corrections. It was originally written by Eric S. Raymond after he read Paul Graham's article "A Plan for Spam" and is now maintained together with a group of contributors by David Relson, Matthias Andree [2] and Greg Louis.
The statistical technique used is known as Bayesian filtering. Bogofilter's primary algorithm uses the f(w) parameter and the Fisher inverse chi-square technique that he describes.
Bogofilter may be run by a MDA or mail client to classify messages as they are delivered to recipient mailboxes, or be used by a MTA to classify messages as they are received from the sending SMTP server. Bogofilter examines tokens in the message body and header, and refers to wordlists stored by BerkeleyDB, SQLite or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and HTML and supports reading multi-part MIME message including base64, quoted-printable, and uuencoded text or HTML. Bogofilter ignores non-text attachments, such as images.
It is possible to tune Bogofilter's statistical algorithms by modifying various coefficients and other settings in its configuration file, or by using the automated bogotune utility included with the software, which attempts to optimise various coefficients to maximise filtering efficiency for a particular corpus of spam and non-spam.
Standard tests at TREC 2005 show that Bogofilter compares well to its competitors spambayes, CRM114 and DSPAM. Other competitors include, but are not limited to Spamprobe and QSF.
Bogofilter is written in C, and runs on Linux, FreeBSD, NetBSD, OpenBSD, Solaris, Mac OS X, HP-UX, AIX and other platforms. It is released under the GNU GPL.
The following email clients are known to support Bogofilter as a spam filtering backend:
Electronic mail is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).
Sylpheed is an open-source e-mail client and news client licensed under GNU GPL-2.0-or-later with the library part LibSylph under GNU LGPL-2.1-or-later. It provides easy configuration and an abundance of features. It stores mail in the MH Message Handling System. Sylpheed runs on Unix-like systems such as Linux or BSD, and it is also usable on Windows. It uses GTK+.
procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats.
Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS and fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases. It is released under the Apache License 2.0 and is a part of the Apache Foundation since 2004.
Various anti-spam techniques are used to prevent email spam.
CRM114 is a program based upon a statistical approach for classifying data, and especially used for filtering email spam.
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.
Hashcash is a proof-of-work system used to limit E-mail spam and denial-of-service attacks. Hashcash was proposed in 1997 by Adam Back and described more formally in Back's 2002 paper "Hashcash - A Denial of Service Counter-Measure".
Pegasus Mail is a proprietary email client developed by David Harris. It was originally released in 1990 for internal and external mail on NetWare networks with MS-DOS and later Apple Macintosh clients. It was subsequently ported to Microsoft Windows, which is now the only platform actively supported. Previously freeware, Pegasus Mail is now donationware.
POPFile is an abandoned free, open-source, cross-platform mail filter originally written in Perl by John Graham-Cumming and maintained by a team of volunteers. It uses a naive Bayes classifier to filter mail. This allows the filter to "learn" and classify mail according to the user's preferences. Typically it is used to filter spam mail. It can also be used to sort mail into other user defined "buckets" or categories - for example, the user may define a bucket into which work email is sorted.
Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly applying anti-spam techniques. Filtering can be applied to incoming emails as well as to outgoing ones.
Kontact is a personal information manager and groupware software suite developed by KDE. It supports calendars, contacts, notes, to-do lists, news, and email. It offers a number of inter-changeable graphical UIs all built on top of a common core.
The following tables compare general and technical features of notable email client programs.
HTML email is the use of a subset of HTML to provide formatting and semantic markup capabilities in email that are not available with plain text: Text can be linked without displaying a URL, or breaking long URLs into multiple pieces. Text is wrapped to fit the width of the viewing window, rather than uniformly breaking each line at 78 characters. It allows in-line inclusion of images, tables, as well as diagrams or mathematical formulae as images, which are otherwise difficult to convey.
Claws Mail is a free and open-source, C/GTK-based e-mail client, which is both lightweight and highly configurable. Claws Mail runs on both Windows and Unix-like systems such as Linux, BSD, and Solaris. It stores mail in the MH mailbox format. Plugins allow to read HTML mail, but there is none to compose HTML messages.
Gary Robinson is an American software engineer and mathematician and inventor notable for his mathematical algorithms to fight spam. In addition, he patented a method to use web browser cookies to track consumers across different web sites, allowing marketers to better match advertisements with consumers. The patent was bought by DoubleClick, and then DoubleClick was bought by Google. He is credited as being one of the first to use automated collaborative filtering technologies to turn word-of-mouth recommendations into useful data.
People tend to be much less bothered by spam slipping through filters into their mail box, than having desired e-mail ("ham") blocked. Trying to balance false negatives vs false positives is critical for a successful anti-spam system. As servers are not able to block all spam there are some tools for individual users to help control over this balance.
Amavis is an open-source content filter for electronic mail, implementing mail message transfer, decoding, some processing and checking, and interfacing with external content filters to provide protection against spam and viruses and other malware. It can be considered an interface between a mailer and one or more content filters.
This article, or an earlier revision of it, was edited from bogofilter's homepage.