Collision attack

Last updated May 18, 2024

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

Classical collision attack

Much like symmetric-key ciphers are vulnerable to brute force attacks, every cryptographic hash function is inherently vulnerable to collisions using a birthday attack. Due to the birthday problem, these attacks are much faster than a brute force would be. A hash of n bits can be broken in 2^n/2 time steps (evaluations of the hash function).

Mathematically stated, a collision attack finds two different messages m1 and m2, such that hash(m1) = hash(m2). In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm.

More efficient attacks are possible by employing cryptanalysis to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The NIST hash function competition was largely induced by published collision attacks against two very commonly used hash functions, MD5 ^[1] and SHA-1. The collision attacks against MD5 have improved so much that, as of 2007, it takes just a few seconds on a regular computer.^[2] Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols.

However, workarounds are possible by abusing dynamic constructs present in many formats. In this way, two documents would be created which are as similar as possible in order to have the same hash value. One document would be shown to an authority to be signed, and then the signature could be copied to the other file. Such a malicious document would contain two different messages in the same document, but conditionally display one or the other through subtle changes to the file:

Some document formats like PostScript, or macros in Microsoft Word, have conditional constructs.^[3]^[4] (if-then-else) that allow testing whether a location in the file has one value or another in order to control what is displayed.
TIFF files can contain cropped images, with a different part of an image being displayed without affecting the hash value.^[4]
PDF files are vulnerable to collision attacks by using color value (such that text of one message is displayed with a white color that blends into the background, and text of the other message is displayed with a dark color) which can then be altered to change the signed document's content.^[4]

Chosen-prefix collision attack

An extension of the collision attack is the chosen-prefix collision attack, which is specific to Merkle–Damgård hash functions. In this case, the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value. This attack is normally harder, a hash of n bits can be broken in 2^(n/2)+1 time steps, but is much more powerful than a classical collision attack.

Mathematically stated, given two different prefixes p₁, p₂, the attack finds two suffixes s₁ and s₂ such that hash(p₁ ∥ s₁) = hash(p₂ ∥ s₂) (where ∥ is the concatenation operation).

More efficient attacks are also possible by employing cryptanalysis to specific hash functions. In 2007, a chosen-prefix collision attack was found against MD5, requiring roughly 2⁵⁰ evaluations of the MD5 function. The paper also demonstrates two X.509 certificates for different domain names, with colliding hash values. This means that a certificate authority could be asked to sign a certificate for one domain, and then that certificate (specially its signature) could be used to create a new rogue certificate to impersonate another domain.^[5]

A real-world collision attack was published in December 2008 when a group of security researchers published a forged X.509 signing certificate that could be used to impersonate a certificate authority, taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any SSL-secured website as a man-in-the-middle, thereby subverting the certificate validation built in every web browser to protect electronic commerce. The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004,^[1] certificate authorities were still willing to sign MD5-verified certificates in December 2008,^[6] and at least one Microsoft code-signing certificate was still using MD5 in May 2012.

The Flame malware successfully used a new variation of a chosen-prefix collision attack to spoof code signing of its components by a Microsoft root certificate that still used the compromised MD5 algorithm.^[7]^[8]

In 2019, researchers found a chosen-prefix collision attack against SHA-1 with computing complexity between 2^66.9 and 2^69.4 and cost less than 100,000 US dollars. ^[9]^[10] In 2020, researchers reduced the complexity of a chosen-prefix collision attack against SHA-1 to 2^63.4. ^[11]

Attack scenarios

Many applications of cryptographic hash functions do not rely on collision resistance, thus collision attacks do not affect their security. For example, HMACs are not vulnerable.^[12] For the attack to be useful, the attacker must be in control of the input to the hash function.

Digital signatures

Because digital signature algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size. Digital signature schemes often become vulnerable to hash collisions as soon as the underlying hash function is practically broken; techniques like randomized (salted) hashing will buy extra time by requiring the harder preimage attack.^[13]

The usual attack scenario goes like this:

Mallory creates two different documents A and B that have an identical hash value, i.e., a collision. Mallory seeks to deceive Bob into accepting document B, ostensibly from Alice.
Mallory sends document A to Alice, who agrees to what the document says, signs its hash, and sends the signature to Mallory.
Mallory attaches the signature from document A to document B.
Mallory then sends the signature and document B to Bob, claiming that Alice signed B. Because the digital signature matches document B's hash, Bob's software is unable to detect the substitution.^{[ citation needed ]}

In 2008, researchers used a chosen-prefix collision attack against MD5 using this scenario, to produce a rogue certificate authority certificate. They created two versions of a TLS public key certificate, one of which appeared legitimate and was submitted for signing by the RapidSSL certificate authority. The second version, which had the same MD5 hash, contained flags which signal web browsers to accept it as a legitimate authority for issuing arbitrary other certificates.^[14]

Hash flooding

Hash flooding (also known as HashDoS^[15]) is a denial of service attack that uses hash collisions to exploit the worst-case (linear probe) runtime of hash table lookups.^[16] It was originally described in 2003. To execute such an attack, the attacker sends the server multiple pieces of data that hash to the same value and then tries to get the server to perform slow lookups. As the main focus of hash functions used in hash tables was speed instead of security, most major programming languages were affected,^[17] with new vulnerabilities of this class still showing up a decade after the original presentation.^[16]

To prevent hash flooding without making the hash function overly complex, newer keyed hash functions are introduced, with the security objective that collisions are hard to find as long as the key is unknown. They may be slower than previous hashes, but are still much easier to compute than cryptographic hashes. As of 2021, Jean-Philippe Aumasson and Daniel J. Bernstein's SipHash (2012) is the most widely-used hash function in this class.^[18] (Non-keyed "simple" hashes remain safe to use as long as the application's hash table is not controllable from the outside.)

It is possible to perform an analogous attack to fill up Bloom filters using a (partial) preimage attack.^[19]

Related Research Articles

<span class="mw-page-title-main">HMAC</span> Computer communications hash algorithm

In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message. An HMAC is a type of keyed hash function that can also be used in a key derivation scheme or a key stretching scheme.

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.

In cryptography, SHA-1 is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States National Security Agency, and is a U.S. Federal Information Processing Standard. The algorithm has been cryptographically broken but is still widely used.

A birthday attack is a bruteforce collision attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes). With a birthday attack, it is possible to find a collision of a hash function with $chance in, with being the classical preimage resistance security with the same probability. There is a general result that quantum computers can perform birthday attacks, thus breaking collision resistance, in .$

A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:

In cryptography, Tiger is a cryptographic hash function designed by Ross Anderson and Eli Biham in 1995 for efficiency on 64-bit platforms. The size of a Tiger hash value is 192 bits. Truncated versions can be used for compatibility with protocols assuming a particular hash size. Unlike the SHA-2 family, no distinguishing initialization values are defined; they are simply prefixes of the full Tiger/192 hash value.

<span class="mw-page-title-main">MD4</span> Cryptographic hash function

The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism "MD" stands for "Message Digest".

File verification is the process of using an algorithm for verifying the integrity of a computer file, usually by checksum. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.

In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage.

The MD2 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1989. The algorithm is optimized for 8-bit computers. MD2 is specified in IETF RFC 1319. The "MD" in MD2 stands for "Message Digest".

Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive information, such as online banking transaction history. It applies a hash function to the username and password before sending them over the network. In contrast, basic access authentication uses the easily reversible Base64 encoding instead of hashing, making it non-secure unless used in conjunction with TLS.

SHA-2 is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression function itself built using the Davies–Meyer structure from a specialized block cipher.

In cryptography, collision resistance is a property of cryptographic hash functions: a hash function H is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b where a ≠ b but H(a) = H(b). The pigeonhole principle means that any hash function with more inputs than outputs will necessarily have such collisions; the harder they are to find, the more cryptographically secure the hash function is.

In cryptography, a Lamport signature or Lamport one-time signature scheme is a method for constructing a digital signature. Lamport signatures can be built from any cryptographically secure one-way function; usually a cryptographic hash function is used.

In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. This construction was used in the design of many popular hash algorithms such as MD5, SHA-1 and SHA-2.

In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, "thumbprint" is used instead of "fingerprint."

This article summarizes publicly known attacks against cryptographic hash functions. Note that not all entries may be up to date. For a summary of other hash function parameters, see comparison of cryptographic hash functions.

In cryptography and computer security, a length extension attack is a type of attack where an attacker can use Hash(message₁) and the length of message₁ to calculate Hash(message₁ ‖ message₂) for an attacker-controlled message₂, without needing to know the content of message₁. This is problematic when the hash is used as a message authentication code with construction Hash(secret ‖ message), and message and the length of secret is known, because an attacker can include extra information at the end of the message and produce a valid hash without knowing the secret. Algorithms like MD5, SHA-1 and most of SHA-2 that are based on the Merkle–Damgård construction are susceptible to this kind of attack. Truncated versions of SHA-2, including SHA-384 and SHA-512/256 are not susceptible, nor is the SHA-3 algorithm. HMAC also uses a different construction and so is not vulnerable to length extension attacks.

In cryptography, the Salted Challenge Response Authentication Mechanism (SCRAM) is a family of modern, password-based challenge–response authentication mechanisms providing authentication of a user to a server. As it is specified for Simple Authentication and Security Layer (SASL), it can be used for password-based logins to services like LDAP, HTTP, SMTP, POP3, IMAP and JMAP (e-mail), XMPP (chat), or MongoDB and PostgreSQL (databases). For XMPP, supporting it is mandatory.

Dr. ir. Marc Stevens is a cryptology researcher most known for his work on cryptographic hash collisions and for the creation of the chosen-prefix hash collision tool HashClash as part of his master's degree thesis. He first gained international attention for his work with Alexander Sotirov, Jacob Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, and Benne de Weger in creating a rogue SSL certificate which was presented in 2008 during the 25th annual Chaos Communication Congress warning of the dangers of using the MD5 hash function in issuing SSL certificates. Several years later in 2012, according to Microsoft, the authors of the Flame malware used similar methodology to that which the researchers warned of by initiating an MD5 collision to forge a Windows code-signing certificate. Marc was most recently awarded the Google Security Privacy and Anti-abuse applied award. Google selected Stevens for this award in recognition of his work in Cryptanalysis, in particular related to the SHA-1 hash function.

References

1 2 Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu: Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD, Cryptology ePrint Archive Report 2004/199, 16 Aug 2004, revised 17 Aug 2004. Retrieved July 27, 2008.
↑ M.M.J. Stevens (June 2007). "On Collisions for MD5" (PDF). [...] we are able to find collisions for MD5 in about 2^24.1 compressions for recommended IHVs which takes approx. 6 seconds on a 2.6GHz Pentium 4.{{cite journal}}: Cite journal requires |journal= (help)
↑ Magnus Daum; Stefan Lucks. "Hash Collisions (The Poisoned Message Attack)". Eurocrypt 2005 rump session. Archived from the original on 2010-03-27.
1 2 3 Max Gebhardt; Georg Illies; Werner Schindler (4 January 2017). "A Note on the Practical Value of Single Hash Collisions for Special File Formats" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
↑ Marc Stevens; Arjen Lenstra; Benne de Weger (2007-11-30). "Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". Advances in Cryptology - EUROCRYPT 2007. Lecture Notes in Computer Science. Vol. 4515. p. 1. Bibcode:2007LNCS.4515....1S. doi: 10.1007/978-3-540-72540-4_1 . ISBN 978-3-540-72539-8.
↑ Alexander Sotirov; et al. (2008-12-30). "Creating a rogue CA certificate". Archived from the original on 2012-04-18. Retrieved 2009-10-07.
↑ "Microsoft releases Security Advisory 2718704". Microsoft. 3 June 2012. Archived from the original on 7 June 2012. Retrieved 4 June 2012.
↑ Marc Stevens (7 June 2012). "CWI Cryptanalist Discovers New Cryptographic Attack Variant in Flame Spy Malware". Centrum Wiskunde & Informatica. Retrieved 9 June 2012.
↑ Catalin Cimpanu (2019-05-13). "SHA-1 collision attacks are now actually practical and a looming danger". ZDNet.
↑ Gaëtan Leurent; Thomas Peyrin (2019-05-06). "From Collisions to Chosen-Prefix Collisions Application to Full SHA-1" (PDF).
↑ Gaëtan Leurent; Thomas Peyrin (2020-01-05). "SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust" (PDF).
↑ "Hash Collision Q&A". Cryptography Research Inc. 2005-02-15. Archived from the original on 2008-07-17. Because of the way hash functions are used in the HMAC construction, the techniques used in these recent attacks do not apply
↑ Shai Halevi and Hugo Krawczyk, Randomized Hashing and Digital Signatures Archived 2009-06-20 at the Wayback Machine
↑ Alexander Sotirov; Marc Stevens; Jacob Appelbaum; Arjen Lenstra; David Molnar; Dag Arne Osvik; Benne de Weger (30 December 2008). MD5 considered harmful today. Chaos Communication Congress 2008.
↑ Falkenberg, Andreas; Mainka, Christian; Somorovsky, Juraj; Schwenk, Jörg (2013). "A New Approach towards DoS Penetration Testing on Web Services". 2013 IEEE 20th International Conference on Web Services. pp. 491–498. doi:10.1109/ICWS.2013.72. ISBN 978-0-7695-5025-1. S2CID 17805370.
1 2 "About that hash flooding vulnerability in Node.js... · V8". v8.dev.
↑ Scott A. Crosby and Dan S. Wallach. 2003. Denial of service via algorithmic complexity attacks. In Proceedings of the 12th conference on USENIX Security Symposium - Volume 12 (SSYM'03), Vol. 12. USENIX Association, Berkeley, CA, USA, 3-3.
↑ Jean-Philippe Aumasson & Daniel J. Bernstein (2012-09-18). "SipHash: a fast short-input PRF" (PDF).
↑ Gerbet, Thomas; Kumar, Amrit; Lauradoux, Cédric (12 November 2014). The Power of Evil Choices in Bloom Filters (report). INRIA Grenoble.

External links

"Meaningful Collisions", attack scenarios for exploiting cryptographic hash collisions
Fast MD5 and MD4 Collision Generators - Bishop Fox (formerly Stach & Liu). Create MD4 and MD5 hash collisions using groundbreaking new code that improves upon the techniques originally developed by Xiaoyun Wang. Using a 1.6 GHz Pentium 4, MD5 collisions can be generated in an average of 45 minutes, and MD4 collisions can be generated in an average of 5 seconds. Originally released on 22Jun2006.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[md5-2004-1] 1 2 Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu: Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD, Cryptology ePrint Archive Report 2004/199, 16 Aug 2004, revised 17 Aug 2004. Retrieved July 27, 2008.

[2] M.M.J. Stevens (June 2007). "On Collisions for MD5" (PDF). [...] we are able to find collisions for MD5 in about 2^24.1 compressions for recommended IHVs which takes approx. 6 seconds on a 2.6GHz Pentium 4.{{cite journal}}: Cite journal requires |journal= (help)

[3] Magnus Daum; Stefan Lucks. "Hash Collisions (The Poisoned Message Attack)". Eurocrypt 2005 rump session. Archived from the original on 2010-03-27.

[special-file-formats-4] 1 2 3 Max Gebhardt; Georg Illies; Werner Schindler (4 January 2017). "A Note on the Practical Value of Single Hash Collisions for Special File Formats" (PDF).{{cite journal}}: Cite journal requires |journal= (help)

[md5-chosen-2007-5] Marc Stevens; Arjen Lenstra; Benne de Weger (2007-11-30). "Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". Advances in Cryptology - EUROCRYPT 2007. Lecture Notes in Computer Science. Vol. 4515. p. 1. Bibcode:2007LNCS.4515....1S. doi: 10.1007/978-3-540-72540-4_1 . ISBN 978-3-540-72539-8.

[6] Alexander Sotirov; et al. (2008-12-30). "Creating a rogue CA certificate". Archived from the original on 2012-04-18. Retrieved 2009-10-07.

[7] "Microsoft releases Security Advisory 2718704". Microsoft. 3 June 2012. Archived from the original on 7 June 2012. Retrieved 4 June 2012.

[8] Marc Stevens (7 June 2012). "CWI Cryptanalist Discovers New Cryptographic Attack Variant in Flame Spy Malware". Centrum Wiskunde & Informatica. Retrieved 9 June 2012.

[SHA-1_collision_attacks_are_now_actually_practical_and_a_looming_danger_2019-9] Catalin Cimpanu (2019-05-13). "SHA-1 collision attacks are now actually practical and a looming danger". ZDNet.

[Collisions_of_Chosen-Prefix_Collisions_SHA-1_2019-10] Gaëtan Leurent; Thomas Peyrin (2019-05-06). "From Collisions to Chosen-Prefix Collisions Application to Full SHA-1" (PDF).

[SHA-1_is_a_Shambles_-_First_Chosen-Prefix_Collision_on_SHA-1_and_Application_to_the_PGP_Web_of_Trust-11] Gaëtan Leurent; Thomas Peyrin (2020-01-05). "SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust" (PDF).

[collision-qna-12] "Hash Collision Q&A". Cryptography Research Inc. 2005-02-15. Archived from the original on 2008-07-17. Because of the way hash functions are used in the HMAC construction, the techniques used in these recent attacks do not apply

[13] Shai Halevi and Hugo Krawczyk, Randomized Hashing and Digital Signatures Archived 2009-06-20 at the Wayback Machine

[14] Alexander Sotirov; Marc Stevens; Jacob Appelbaum; Arjen Lenstra; David Molnar; Dag Arne Osvik; Benne de Weger (30 December 2008). MD5 considered harmful today. Chaos Communication Congress 2008.

[15] Falkenberg, Andreas; Mainka, Christian; Somorovsky, Juraj; Schwenk, Jörg (2013). "A New Approach towards DoS Penetration Testing on Web Services". 2013 IEEE 20th International Conference on Web Services. pp. 491–498. doi:10.1109/ICWS.2013.72. ISBN 978-0-7695-5025-1. S2CID 17805370.

[v8-16] 1 2 "About that hash flooding vulnerability in Node.js... · V8". v8.dev.

[17] Scott A. Crosby and Dan S. Wallach. 2003. Denial of service via algorithmic complexity attacks. In Proceedings of the 12th conference on USENIX Security Symposium - Volume 12 (SSYM'03), Vol. 12. USENIX Association, Berkeley, CA, USA, 3-3.

[SipHash-18] Jean-Philippe Aumasson & Daniel J. Bernstein (2012-09-18). "SipHash: a fast short-input PRF" (PDF).

[19] Gerbet, Thomas; Kumar, Amrit; Lauradoux, Cédric (12 November 2014). The Power of Evil Choices in Bloom Filters (report). INRIA Grenoble.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]