Hash buster

Last updated

A hash buster is a program which randomly adds characters to data in order to change the data's hash sum. [1]

Computer program Instructions to be executed by a computer

A computer program is a collection of instructions that performs a specific task when executed by a computer. A computer requires programs to function.

Contents

This is typically used to add words to spam e-mails, to bypass hash filters. As the e-mail's hash sum is different from the sum of e-mails previously defined as spam, the e-mail is not considered spam and therefore delivered as if it were a normal message.

A hash filter creates a hash sum from data, typically e-mail, and compares the sum against other previously defined sums. Depending on the purpose of the filter, the data can then be included or excluded in a function based on whether it matches an existing sum.

Hash busters can also be used to randomly add content to any kind of file until the hash sum becomes a certain sum. In e-mail context, this could be used to bypass a filter which only accepts e-mails with a certain sum.

Initially spams containing "white noise" from hash busters tended to simply exhibit 'paragraphs' of literally random words, but increasingly these are now appearing somewhat grammatical.

See also

Cryptographic hash function Special class of hash function that has certain properties which make it suitable for use in cryptography

A cryptographic hash function is a special class of hash function that has certain properties which make it suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size to a bit string of a fixed size and is designed to be a one-way function, that is, a function which is infeasible to invert. The only way to recreate the input data from an ideal cryptographic hash function's output is to attempt a brute-force search of possible inputs to see if they produce a match, or use a rainbow table of matched hashes. Bruce Schneier has called one-way hash functions "the workhorses of modern cryptography". The input data is often called the message, and the output is often called the message digest or simply the digest.

Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error.

Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data. LSH hashes input items so that similar items map to the same “buckets” with high probability. LSH differs from conventional and cryptographic hash functions because it aims to maximize the probability of a “collision” for similar items. Locality-sensitive hashing has much in common with data clustering and nearest neighbor search.

Related Research Articles

Checksum small-size datum computed from an arbitrary block of digital data for the purpose of detecting errors

A checksum is a small-sized datum derived from a block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. It is usually applied to an installation file after it is received from the download server. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

Open mail relay

An open mail relay is an SMTP server configured in such a way that it allows anyone on the Internet to send e-mail through it, not just mail destined to or originating from known users. This used to be the default configuration in many mail servers; indeed, it was the way the Internet was initially set up, but open mail relays have become unpopular because of their exploitation by spammers and worms. Many relays were closed, or were placed on blacklists by other servers.

Whitelisting is the practice of explicitly allowing some identified entities access to a particular privilege, service, mobility, access or recognition. It is the reverse of blacklisting.

Apache SpamAssassin A computer program for e-mail spam filtering

Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS-based and fuzzy-checksum-based spam detection, Bayesian filtering, external programs, blacklists and online databases. It is released under the Apache License 2.0 and is a part of the Apache Foundation since 2004.

Naive Bayes spam filtering technique for filtering spam e-mail

Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag of words features to identify spam e-mail, an approach commonly used in text classification.

Spam in blogs is a form of spamdexing. It is done by posting random comments, copying material from elsewhere that is not original, or promoting commercial services to blogs, wikis, guestbooks, or other publicly accessible online discussion boards. Any web application that accepts and displays hyperlinks submitted by visitors may be a target.

Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin as part of the mining algorithm. Hashcash was proposed in 1997 by Adam Back and described more formally in Back's paper "Hashcash - A Denial of Service Counter-Measure".

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more elements that are added to the set, the larger the probability of false positives.

Blue Frog anti-spam software (defunct)

Blue Frog was a freely-licensed anti-spam tool produced by Blue Security Inc. and operated as part of a community-based system which tried to persuade spammers to remove community members' addresses from their mailing lists by automating the complaint process for each user as spam is received. Blue Security maintained these addresses in a hashed form in a Do Not Intrude Registry, and spammers could use free tools to clean their lists. The tool was discontinued in 2006.

Spam poetry, sometimes called spoetry, is poetic verse composed primarily from the subject lines or content of spam e-mail messages.

DomainKeys Identified Mail (DKIM) is an email authentication method designed to detect forged sender addresses in emails,, a technique often used in phishing and email spam.

The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model has also been used for computer vision.

Blacklist (computing) criteria to prohibit computer access

In computing, a blacklist or blocklist is a basic access control mechanism that allows through all elements, except those explicitly mentioned. Those items on the list are denied access. The opposite is a whitelist, which means only items on the list are let through whatever gate is being used. A greylist contains items that are temporarily blocked until an additional step is performed.

Forum spam

Forum spam consists of posts on Internet forums that contains related or unrelated advertisements, links to malicious websites, trolling and abusive or otherwise unwanted information. Forum spam is usually posted onto message boards by automated spambots or manually with unscrupulous intentions with one idea in mind: to get the spam in front of readers who would not otherwise have anything to do with it intentionally.

Kleffman v. Vonage Holdings Corp., 232 P.3d 625, is a 2010 Supreme Court of California case certified by United States Court of Appeals for the Ninth Circuit. The decision ruled that sending unsolicited advertisement Emails using multiple domain names was not unlawful under California Business and Professions Code section 17529.5, subdivision (a)(2), which made it unlawful to advertise in a commercial Email advertisement that contained or was accompanied by falsified, misrepresented, or forged header information.

People tend to be much less bothered by spam slipping through filters into their mail box, than having desired e-mail ("ham") blocked. Trying to balance false negatives vs false positives is critical for a successful anti-spam system. As servers are not able to block all spam there are some tools for individual users to help control over this balance.

SmartScreen is a cloud-based anti-phishing and anti-malware component included in several Microsoft products, including Windows 8 and later, Internet Explorer, Microsoft Edge and Outlook.com. It is designed to help protect users against attacks that utilize social engineering and drive-by downloads to infect a system by scanning URLs accessed by a user against a blacklist of websites containing known threats. With the Windows 10 Creators Update, Microsoft placed the SmartScreen settings into the Windows Defender Security Center.

In machine learning, feature hashing, also known as the hashing trick, is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly, rather than looking the indices up in an associative array. This trick is often attributed to Weinberger et al., but there exists a much earlier description of this method published by John Moody in 1989.

References

  1. Delio, Michelle (13 January 2004). "Random Acts of Spamness". Wired Tech Biz. Wired Magazine. Retrieved 24 September 2011.