MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. [1] [2] [3] It was created by Austin Appleby in 2008 [4] and, as of 8 January 2016, [5] is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants, [6] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
The original MurmurHash was created as an attempt to make a faster function than Lookup3. [7] Although successful, it had not been tested thoroughly and was not capable of providing 64-bit hashes as in Lookup3. Its design would be later built upon in MurmurHash2, combining a multiplicative hash (similar to the Fowler–Noll–Vo hash function) with an Xorshift.
MurmurHash2 [8] yields a 32- or 64-bit value. It comes in multiple variants, including some that allow incremental hashing and aligned or neutral versions.
The person who originally found the flaw[ clarification needed ] in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160. [11]
The current version, completed April 3, 2011, is MurmurHash3, [12] [13] which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher, a hash function test suite.
The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python, [14] C, [15] Go, [16] C#, [13] [17] D, [18] Lua, Perl, [19] Ruby, [20] Rust, [21] PHP, [22] [23] Common Lisp, [24] Haskell, [25] Elm, [26] Clojure, [27] Scala, [28] Java, [29] [30] Erlang, [31] Swift, [32] Object Pascal, [33] Kotlin, [34] JavaScript, [35] and OCaml. [36]
It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1), [37] Rubinius, [38] libmemcached (the C driver for Memcached), [39] npm (nodejs package manager), [40] maatkit, [41] Hadoop, [1] Kyoto Cabinet, [42] Cassandra, [43] [44] Solr, [45] vowpal wabbit, [46] Elasticsearch, [47] Guava, [48] Kafka, [49] and RedHat Virtual Data Optimizer (VDO). [50]
Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks. [51] With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.
algorithm Murmur3_32 is// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.// In the case of overflow, the result is reduced modulo 232.input:key, len, seed c1 ← 0xcc9e2d51 c2 ← 0x1b873593 r1 ← 15 r2 ← 13 m ← 5 n ← 0xe6546b64 hash ← seedfor each fourByteChunk of keydo k ← fourByteChunk k ← k × c1 k ← k ROL r1 k ← k × c2 hash ← hash XOR k hash ← hash ROL r2 hash ← (hash × m) + n with any remainingBytesInKey do remainingBytes ← SwapToLittleEndian(remainingBytesInKey) // Note: Endian swapping is only necessary on big-endian machines.// The purpose is to place the meaningful digits towards the low end of the value,// so that these digits have the greatest potential to affect the low range digits// in the subsequent multiplication. Consider that locating the meaningful digits// in the high range would produce a greater effect upon the high digits of the// multiplication, and notably, that such high digits are likely to be discarded// by the modulo arithmetic under overflow. We don't want that. remainingBytes ← remainingBytes × c1 remainingBytes ← remainingBytes ROL r1 remainingBytes ← remainingBytes × c2 hash ← hash XOR remainingBytes hash ← hash XOR len hash ← hash XOR (hash >> 16) hash ← hash × 0x85ebca6b hash ← hash XOR (hash >> 13) hash ← hash × 0xc2b2ae35 hash ← hash XOR (hash >> 16)
A sample C implementation follows (for little-endian CPUs):
staticinlineuint32_tmurmur_32_scramble(uint32_tk){k*=0xcc9e2d51;k=(k<<15)|(k>>17);k*=0x1b873593;returnk;}uint32_tmurmur3_32(constuint8_t*key,size_tlen,uint32_tseed){uint32_th=seed;uint32_tk;/* Read in groups of 4. */for(size_ti=len>>2;i;i--){// Here is a source of differing results across endiannesses.// A swap here has no effects on hash properties though.memcpy(&k,key,sizeof(uint32_t));key+=sizeof(uint32_t);h^=murmur_32_scramble(k);h=(h<<13)|(h>>19);h=h*5+0xe6546b64;}/* Read the rest. */k=0;for(size_ti=len&3;i;i--){k<<=8;k|=key[i-1];}// A swap is *not* necessary here because the preceding loop already// places the low bytes in the low places according to whatever endianness// we use. Swaps only apply when the memory is copied in a chunk.h^=murmur_32_scramble(k);/* Finalize. */h^=len;h^=h>>16;h*=0x85ebca6b;h^=h>>13;h*=0xc2b2ae35;h^=h>>16;returnh;}
Test string | Seed value | Hash value (hexadecimal) | Hash value (decimal) |
---|---|---|---|
0x00000000 | 0x00000000 | 0 | |
0x00000001 | 0x514E28B7 | 1,364,076,727 | |
0xffffffff | 0x81F16F39 | 2,180,083,513 | |
test | 0x00000000 | 0xba6bd213 | 3,127,628,307 |
test | 0x9747b28c | 0x704b81dc | 1,883,996,636 |
Hello, world! | 0x00000000 | 0xc0363e43 | 3,224,780,355 |
Hello, world! | 0x9747b28c | 0x24884CBA | 612,912,314 |
The quick brown fox jumps over the lazy dog | 0x00000000 | 0x2e4ff723 | 776,992,547 |
The quick brown fox jumps over the lazy dog | 0x9747b28c | 0x2FA826CD | 799,549,133 |
Blowfish is a symmetric-key block cipher, designed in 1993 by Bruce Schneier and included in many cipher suites and encryption products. Blowfish provides a good encryption rate in software, and no effective cryptanalysis of it has been found to date for smaller files. It is recommended Blowfish should not be used to encrypt files larger than 4GB in size, Twofish should be used instead.
The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.
In cryptography, SHA-1 is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States National Security Agency, and is a U.S. Federal Information Processing Standard. The algorithm has been cryptographically broken but is still widely used.
In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.
In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.
Adler-32 is a checksum algorithm written by Mark Adler in 1995, modifying Fletcher's checksum. Compared to a cyclic redundancy check of the same length, it trades reliability for speed. Adler-32 is more reliable than Fletcher-16, and slightly less reliable than Fletcher-32.
SHA-2 is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression function itself built using the Davies–Meyer structure from a specialized block cipher.
In cryptography, a universal hashing message authentication code, or UMAC, is a message authentication code (MAC) calculated using universal hashing, which involves choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function that was used. A variation of the scheme was first published in 1999. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message. In contrast to traditional MACs, which are serializable, a UMAC can be executed in parallel. Thus, as machines continue to offer more parallel-processing capabilities, the speed of implementing UMAC can increase.
The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers (UUIDs), which are also known as globally unique identifiers (GUIDs). Forming a part of the Unified Extensible Firmware Interface (UEFI) standard, it is nevertheless also used for some BIOSs, because of the limitations of master boot record (MBR) partition tables, which use 32 bits for logical block addressing (LBA) of traditional 512-byte disk sectors.
Salsa20 and the closely related ChaCha are stream ciphers developed by Daniel J. Bernstein. Salsa20, the original cipher, was designed in 2005, then later submitted to the eSTREAM European Union cryptographic validation process by Bernstein. ChaCha is a modification of Salsa20 published in 2008. It uses a new round function that increases diffusion and increases performance on some architectures.
The GOST hash function, defined in the standards GOST R 34.11-94 and GOST 34.311-95 is a 256-bit cryptographic hash function. It was initially defined in the Russian national standard GOST R 34.11-94 Information Technology – Cryptographic Information Security – Hash Function. The equivalent standard used by other member-states of the CIS is GOST 34.311-95.
A hash array mapped trie (HAMT) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie. It is a refined version of the more general notion of a hash tree.
Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.
SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.
bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.
The Jenkins hash functions are a family of non-cryptographic hash functions for multi-byte keys designed by Bob Jenkins. The first one was formally published in 1997.
BLAKE is a cryptographic hash function based on Daniel J. Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with round constants, is added before each ChaCha round. Like SHA-2, there are two variants differing in the word size. ChaCha operates on a 4×4 array of words. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.
A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots.
In cryptography, scrypt is a password-based key derivation function created by Colin Percival in March 2009, originally for the Tarsnap online backup service. The algorithm was specifically designed to make it costly to perform large-scale custom hardware attacks by requiring large amounts of memory. In 2016, the scrypt algorithm was published by IETF as RFC 7914. A simplified version of scrypt is used as a proof-of-work scheme by a number of cryptocurrencies, first implemented by an anonymous programmer called ArtForz in Tenebrix and followed by Fairbrix and Litecoin soon after.
Argon2 is a key derivation function that was selected as the winner of the 2015 Password Hashing Competition. It was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich from the University of Luxembourg. The reference implementation of Argon2 is released under a Creative Commons CC0 license or the Apache License 2.0, and provides three related versions:
{{cite web}}
: CS1 maint: numeric names: authors list (link)