UUHash is a hash algorithm employed by clients on the FastTrack network. It is employed for its ability to hash very large files in a very short period of time, even on older computers. However, this is achieved by only hashing a fraction of the file. This weakness makes it trivial to create a hash collision, allowing large sections to be completely altered without altering the checksum.
This method is used by Kazaa. The weakness of UUHash is exploited by anti-p2p agencies to corrupt downloads. [1]
The UUHash is a 160-bit string that is usually Base64-encoded for presentation. It is a concatenation of an MD5 hash and a CRC32 sum of selected chunks of the file. [2] [3]
The first 307,200 bytes (300 Kibibyte, one "chunk size") of the file are MD5-hashed (less if file is shorter). The 32 bit little endian integer value smallhash is initialized to 0.
If the file is strictly larger than one chunk size, a series of chunks at file offsets of 2n MiB (n ≥ 0) and one chunk right at the end of the file are hashed using a CRC32 (polynomial 0xEDB88320 reversed, 0x04C11DB7 normal). The last chunk of the power-of-two series ends strictly more than one chunk size before the end of the file, i.e. there is always at least one unread byte between the last two chunks (if there are that many chunks). [footnote 1] The end-of-file chunk may be shorter than one chunk size; it starts at or after one chunk size into the file. The CRC is initialized using smallhash and stored into smallhash.
So, for example:
Finally, the bitwise complement of smallhash (still zero for files up to 300 KiB) is XORed together with the file size in bytes. The 160-bit UUHash is now the concatenation of the 128-bit MD5 hash and the final 32-bit smallhash value.
Given below are hashes (base64 and hex) for strings of various lengths containing only 0x00
or 0xFF
bytes, generated by sig2dat.
Note here that all strings that have a complete MD5 chunk have the same 128-bit prefix because their first chunks are the same (either 0x00 or 0xFF). For files that have the same number of chunks the CRC part differs only because of the included file length (all chunks are identical, or this weren't the case). For files up to 300 KiB, the file length can be extracted from the last four bytes of the hash; smallhash is ~0.
Input | Base64 | Hexadecimal |
---|---|---|
0 bytes | 1B2M2Y8AsgTpgAmY7PhCfv////8= | D41D8CD98F00B204E9800998ECF8427E-FFFFFFFF |
0x00 , 1 byte | k7iFrf4NoInN9jSQT9Wfcf7///8= | 93B885ADFE0DA089CDF634904FD59F71-FEFFFFFF |
0xFF , 1 byte | AFlP1PQrpD/BygQnoFdilf7///8= | 00594FD4F42BA43FC1CA0427A0576295-FEFFFFFF |
0x00 , 2 bytes | xBA/Ei0nZ3ydsUTK4TlKZv3///8= | C4103F122D27677C9DB144CAE1394A66-FDFFFFFF |
0xFF , 2 bytes | qyoNKN5rd//dbHKv6tCZq/3///8= | AB2A0D28DE6B77FFDD6C72AFEAD099AB-FDFFFFFF |
0x00 , 307199 bytes (300 KiB - 1) | YK6+Fj6S4MGzEC9H9Bn3gQBQ+/8= | 60AEBE163E92E0C1B3102F47F419F781-0050FBFF |
0xFF , 307199 bytes (300 KiB - 1) | I+QujFtxa9pBOt5X6NMGsgBQ+/8= | 23E42E8C5B716BDA413ADE57E8D306B2-0050FBFF |
0x00 , 307200 bytes (300 KiB) | kK7e2ZIs+JRup4WGNUk3JP9P+/8= | 90AEDED9922CF8946EA7858635493724-FF4FFBFF |
0xFF , 307200 bytes (300 KiB) | oBSYynx6vdDeUWtP5N4mAv9P+/8= | A01498CA7C7ABDD0DE516B4FE4DE2602-FF4FFBFF |
0x00 , 307201 bytes (300 KiB + 1) | kK7e2ZIs+JRup4WGNUk3JHOg+S0= | 90AEDED9922CF8946EA7858635493724-73A0F92D |
0xFF , 307201 bytes (300 KiB + 1) | oBSYynx6vdDeUWtP5N4mAv5P+wA= | A01498CA7C7ABDD0DE516B4FE4DE2602-FE4FFB00 |
0x00 , 614399 bytes (600 KiB - 1) | kK7e2ZIs+JRup4WGNUk3JHCHqBQ= | 90AEDED9922CF8946EA7858635493724-7087A814 |
0xFF , 614399 bytes (600 KiB - 1) | oBSYynx6vdDeUWtP5N4mAqgX6Xs= | A01498CA7C7ABDD0DE516B4FE4DE2602-A817E97B |
0x00 , 614400 bytes (600 KiB) | kK7e2ZIs+JRup4WGNUk3JGlfGn0= | 90AEDED9922CF8946EA7858635493724-695F1A7D |
0xFF , 614400 bytes (600 KiB) | oBSYynx6vdDeUWtP5N4mApKrf9g= | A01498CA7C7ABDD0DE516B4FE4DE2602-92AB7FD8 |
0x00 , 614401 bytes (600 KiB + 1) | kK7e2ZIs+JRup4WGNUk3JGhfGn0= | 90AEDED9922CF8946EA7858635493724-685F1A7D |
0xFF , 614401 bytes (600 KiB + 1) | oBSYynx6vdDeUWtP5N4mApOrf9g= | A01498CA7C7ABDD0DE516B4FE4DE2602-93AB7FD8 |
0x00 , 614402 bytes (600 KiB + 2) | kK7e2ZIs+JRup4WGNUk3JGtfGn0= | 90AEDED9922CF8946EA7858635493724-6B5F1A7D |
0xFF , 614402 bytes (600 KiB + 2) | oBSYynx6vdDeUWtP5N4mApCrf9g= | A01498CA7C7ABDD0DE516B4FE4DE2602-90AB7FD8 |
0x00 , 16777216 bytes (16 MiB) | kK7e2ZIs+JRup4WGNUk3JN/b8qg= | 90AEDED9922CF8946EA7858635493724-DFDBF2A8 |
0xFF , 16777216 bytes (16 MiB) | oBSYynx6vdDeUWtP5N4mAt0YF2Y= | A01498CA7C7ABDD0DE516B4FE4DE2602-DD181766 |
0x00 , 17084416 bytes (16 MiB + 300 KiB) | kK7e2ZIs+JRup4WGNUk3JN9r9qg= | 90AEDED9922CF8946EA7858635493724-DF6BF6A8 |
0xFF , 17084416 bytes (16 MiB + 300 KiB) | oBSYynx6vdDeUWtP5N4mAt2oE2Y= | A01498CA7C7ABDD0DE516B4FE4DE2602-DDA81366 |
0x00 , 17084417 bytes (16 MiB + 300 KiB + 1) | kK7e2ZIs+JRup4WGNUk3JN5r9qg= | 90AEDED9922CF8946EA7858635493724-DE6BF6A8 |
0xFF , 17084417 bytes (16 MiB + 300 KiB + 1) | oBSYynx6vdDeUWtP5N4mAtyoE2Y= | A01498CA7C7ABDD0DE516B4FE4DE2602-DCA81366 |
0x00 , 17391616 bytes (16 MiB + 600 KiB) | kK7e2ZIs+JRup4WGNUk3JN+7+6g= | 90AEDED9922CF8946EA7858635493724-DFBBFBA8 |
0xFF , 17391616 bytes (16 MiB + 600 KiB) | oBSYynx6vdDeUWtP5N4mAt14HmY= | A01498CA7C7ABDD0DE516B4FE4DE2602-DD781E66 |
0x00 , 17391617 bytes (16 MiB + 600 KiB + 1) | kK7e2ZIs+JRup4WGNUk3JNzVMqw= | 90AEDED9922CF8946EA7858635493724-DCD532AC |
0xFF , 17391617 bytes (16 MiB + 600 KiB + 1) | oBSYynx6vdDeUWtP5N4mAgS1uWk= | A01498CA7C7ABDD0DE516B4FE4DE2602-04B5B969 |
0x00 , 17391618 bytes (16 MiB + 600 KiB + 2) | kK7e2ZIs+JRup4WGNUk3JN/VMqw= | 90AEDED9922CF8946EA7858635493724-DFD532AC |
0xFF , 17391618 bytes (16 MiB + 600 KiB + 2) | oBSYynx6vdDeUWtP5N4mAge1uWk= | A01498CA7C7ABDD0DE516B4FE4DE2602-07B5B969 |
The name UUHash derives from the sig2dat utility which creates URIs referencing files on Kazaa. These URIs are of the form:
sig2dat://|File: surprise.mp3|Length:5845871Bytes|UUHash:=1LDYkHDl65OprVz37xN1VSo9b00=
Not considering the fact that this URI format is not RFC compliant, UUHash refers to the Base64-encoding of the hash and not the hash itself.
In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message. An HMAC is a type of keyed hash function that can also be used in a key derivation scheme or a key stretching scheme.
The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.
In cryptography, SHA-1 is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States National Security Agency, and is a U.S. Federal Information Processing Standard. The algorithm has been cryptographically broken but is still widely used.
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.
A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:
Simple file verification (SFV) is a file format for storing CRC32 checksums of files to verify the integrity of files. SFV is used to verify that a file has not been corrupted, but it does not otherwise verify the file's authenticity. The .sfv
file extension is usually used for SFV files.
File verification is the process of using an algorithm for verifying the integrity of a computer file, usually by checksum. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.
The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check but with the lower computational effort associated with summation techniques.
rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.
The MD2 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1989. The algorithm is optimized for 8-bit computers. MD2 is specified in IETF RFC 1319. The "MD" in MD2 stands for "Message Digest".
Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files (URN) by their content, via cryptographic hash value rather than by their location.
In cryptography, CRAM-MD5 is a challenge–response authentication mechanism (CRAM) based on the HMAC-MD5 algorithm. As one of the mechanisms supported by the Simple Authentication and Security Layer (SASL), it is often used in email software as part of SMTP Authentication and for the authentication of POP and IMAP users, as well as in applications implementing LDAP, XMPP, BEEP, and other protocols.
A rolling hash is a hash function where the input is hashed in a window that moves through the input.
In computing, eD2k links (ed2k://) are hyperlinks used to denote files stored on computers connected to the eDonkey filesharing P2P network.
A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. VLQ is identical to LEB128 except in endianness. See the example below.
Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.
Grøstl is a cryptographic hash function submitted to the NIST hash function competition by Praveen Gauravaram, Lars Knudsen, Krystian Matusiewicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl was chosen as one of the five finalists of the competition. It uses the same S-box as AES in a custom construction. The authors claim speeds of up to 21.4 cycles per byte on an Intel Core 2 Duo, and 9.6 cycles/byte on an Intel i7 with AES-NI.
In the BitTorrent file distribution system, a torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms. Torrent files are normally named with the extension .torrent
.
BLAKE is a cryptographic hash function based on Daniel J. Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with round constants, is added before each ChaCha round. Like SHA-2, there are two variants differing in the word size. ChaCha operates on a 4×4 array of words. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.
The FAT file system is a file system used on MS-DOS and Windows 9x family of operating systems. It continues to be used on mobile devices and embedded systems, and thus is a well suited file system for data exchange between computers and devices of almost any type and age from 1981 through the present.