Distributed Checksum Clearinghouse

Last updated

Distributed Checksum Clearinghouse (also referred to as DCC) is a method of spam email detection.

Contents

The basic logic in DCC is that most spam mails are sent to many recipients. The same message body appearing many times is therefore bulk email. DCC identifies bulk email by calculating a fuzzy checksum on it and sending that to a DCC server. The server responds with the number of times it has received that checksum. An individual email will create a score of 1 each time it is processed. Bulk mail can be identified because the response number is high. The content is not examined. DCC works over the UDP protocol and uses little bandwidth.

DCC is resistant to hashbusters because "the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves" [1] DCC is likely to identify mailing lists as bulk email unless they are white listed. Likewise, repeatedly sending the same email to a server increases its number in the server, and, therefore, the likelihood of it being treated as spam by others.

History

According to the official DCC website:

The DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001.

Related Research Articles

<span class="mw-page-title-main">Checksum</span> Data used to detect errors in other data

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

<span class="mw-page-title-main">Email</span> Mail sent using electronic means

Electronic mail is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.

The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.

<span class="mw-page-title-main">Spamming</span> Unsolicited electronic messages, especially advertisements

Spamming is the use of messaging systems to send multiple unsolicited messages (spam) to large numbers of recipients for the purpose of commercial advertising, for the purpose of non-commercial proselytizing, for any prohibited purpose, or simply repeatedly sending the same message to the same user. While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant messaging spam, Usenet newsgroup spam, Web search engine spam, spam in blogs, wiki spam, online classified ads spam, mobile phone messaging spam, Internet forum spam, junk fax transmissions, social spam, spam mobile apps, television advertising and file sharing spam. It is named after Spam, a luncheon meat, by way of a Monty Python sketch about a restaurant that has Spam in almost every dish in which Vikings annoyingly sing "Spam" repeatedly.

<span class="mw-page-title-main">Open mail relay</span>

An open mail relay is a Simple Mail Transfer Protocol (SMTP) server configured in such a way that it allows anyone on the Internet to send e-mail through it, not just mail destined to or originating from known users. This used to be the default configuration in many mail servers; indeed, it was the way the Internet was initially set up, but open mail relays have become unpopular because of their exploitation by spammers and worms. Many relays were closed, or were placed on blacklists by other servers.

A Domain Name System blocklist, Domain Name System-based blackhole list, Domain Name System blacklist (DNSBL) or real-time blackhole list (RBL) is a service for operation of mail servers to perform a check via a Domain Name System (DNS) query whether a sending host's IP address is blacklisted for email spam. Most mail server software can be configured to check such lists, typically rejecting or flagging messages from such sites.

A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the list".

<span class="mw-page-title-main">Apache SpamAssassin</span> Open-source e-mail spam filter

Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS and fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases. It is released under the Apache License 2.0 and is a part of the Apache Foundation since 2004.

Various anti-spam techniques are used to prevent email spam.

<span class="mw-page-title-main">Email spam</span> Unsolicited electronic advertising by email

Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email (spamming). The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. Email spam has steadily grown since the early 1990s, and by 2014 was estimated to account for around 90% of total email traffic.

Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.

Sender Policy Framework (SPF) is an email authentication method which ensures the sending mail server is authorized to originate mail from the email sender's domain. This authentication only applies to the email sender listed in the "envelope from" field during the initial SMTP connection. If the email is bounced, a message is sent to this address, and for downstream transmission it typically appears in the "Return-Path" header. To authenticate the email address which is actually visible to recipients on the "From:" line, other technologies such as DMARC must be used. Forgery of this address is known as email spoofing, and is often used in phishing and email spam.

Greylisting is a method of defending e-mail users against spam. A mail transfer agent (MTA) using greylisting will "temporarily reject" any email from a sender it does not recognize. If the mail is legitimate, the originating server will try again after a delay, and if sufficient time has elapsed, the email will be accepted.

A bounce message or just "bounce" is an automated message from an email system, informing the sender of a previous message that the message has not been delivered. The original message is said to have "bounced".

<span class="mw-page-title-main">Message submission agent</span>

A message submission agent (MSA), or mail submission agent, is a computer program or software agent that receives electronic mail messages from a mail user agent (MUA) and cooperates with a mail transfer agent (MTA) for delivery of the mail. It uses ESMTP, a variant of the Simple Mail Transfer Protocol (SMTP), as specified in RFC 6409.

Email harvesting or scraping is the process of obtaining lists of email addresses using various methods. Typically these are then used for bulk email or spam.

A challenge–response system is a type of that automatically sends a reply with a challenge to the (alleged) sender of an incoming e-mail. It was originally designed in 1997 by Stan Weatherby, and was called Email Verification. In this reply, the purported sender is asked to perform some action to assure delivery of the original message, which would otherwise not be delivered. The action to perform typically takes relatively little effort to do once, but great effort to perform in large numbers. This effectively filters out spammers. Challenge–response systems only need to send challenges to unknown senders. Senders that have previously performed the challenging action, or who have previously been sent e-mail(s) to, would be automatically receive a challenge.

DomainKeys Identified Mail (DKIM) is an email authentication method designed to detect forged sender addresses in email, a technique often used in phishing and email spam.

Backscatter is incorrectly automated bounce messages sent by mail servers, typically as a side effect of incoming spam.

Email spammers have developed a variety of ways to deliver email spam throughout the years, such as mass-creating accounts on services such as Hotmail or using another person's network to send email spam. Many techniques to block, filter, or otherwise remove email spam from inboxes have been developed by internet users, system administrators and internet service providers. Due to this, email spammers have developed their own techniques to send email spam, which are listed below.

References

  1. Distributed Checksum Clearinghouses official website