Hash table

Last updated

Hash table
Type Unordered associative array
Invented1953
Time complexity in big O notation
OperationAverageWorst case
Search Θ(1) O(n)
Insert Θ(1) O(n)
Delete Θ(1) O(n)
Space complexity
Space Θ(n) [1] O(n)
A small phone book as a hash table Hash table 3 1 1 0 1 0 0 SP.svg
A small phone book as a hash table

In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values. [2] A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

Contents

Most hash table designs employ an imperfect hash function. Hash collisions, where the hash function generates the same index for more than one key, therefore typically must be accommodated in some way.

In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pairs, at amortized constant average cost per operation. [3] [4] [5]

Hashing is an example of a space-time tradeoff. If memory is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a binary search or linear search can be used to retrieve the element. [6] :458

In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

History

The idea of hashing arose independently in different places. In January 1953, Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining. The first example of open addressing was proposed by A. D. Linh, building on Luhn's memorandum. [4] :547 Around the same time, Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler. [7] :124 Open addressing with linear probing is credited to Amdahl, although Andrey Ershov independently had the same idea. [7] :124–125 The term "open addressing" was coined by W. Wesley Peterson in his article which discusses the problem of search in large files. [8] :15

The first published work on hashing with chaining is credited to Arnold Dumey, who discussed the idea of using remainder modulo a prime as a hash function. [8] :15 The word "hashing" was first published in an article by Robert Morris. [7] :126 A theoretical analysis of linear probing was submitted originally by Konheim and Weiss. [8] :15

Overview

An associative array stores a set of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of unique keys. In the hash table implementation of associative arrays, an array of length is partially filled with elements, where . A value gets stored at an index location , where is a hash function, and . [8] :2 Under reasonable assumptions, hash tables have better time complexity bounds on search, delete, and insert operations in comparison to self-balancing binary search trees. [8] :1

Hash tables are also commonly used to implement sets, by omitting the stored value for each key and merely tracking whether the key is present. [8] :1

Load factor

A load factor is a critical statistic of a hash table, and is defined as follows: [1] where

The performance of the hash table deteriorates in relation to the load factor . [8] :2

The software typically ensures that the load factor remains below a certain constant, . This helps maintain good performance. Therefore, a common approach is to resize or "rehash" the hash table whenever the load factor reaches . Similarly the table may also be resized if the load factor drops below . [9]

Load factor for separate chaining

With separate chaining hash tables, each slot of the bucket array stores a pointer to a list or array of data. [10]

Separate chaining hash tables suffer gradually declining performance as the load factor grows, and no fixed point beyond which resizing is absolutely needed. [9]

With separate chaining, the value of that gives best performance is typically between 1 and 3. [9]

Load factor for open addressing

With open addressing, each slot of the bucket array holds exactly one item. Therefore an open-addressed hash table cannot have a load factor greater than 1. [10]

The performance of open addressing becomes very bad when the load factor approaches 1. [9]

Therefore a hash table that uses open addressing must be resized or rehashed if the load factor approaches 1. [9]

With open addressing, acceptable figures of max load factor should range around 0.6 to 0.75. [11] [12] :110

Hash function

A hash function maps the universe of keys to indices or slots within the table, that is, for . The conventional implementations of hash functions are based on the integer universe assumption that all elements of the table stem from the universe , where the bit length of is confined within the word size of a computer architecture. [8] :2

A hash function is said to be perfect for a given set if it is injective on , that is, if each element maps to a different value in . [13] [14] A perfect hash function can be created if all the keys are known ahead of time. [13]

Integer universe assumption

The schemes of hashing used in integer universe assumption include hashing by division, hashing by multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing. [8] :2 However, hashing by division is the commonly used scheme. [15] :264 [12] :110

Hashing by division

The scheme in hashing by division is as follows: [8] :2 where is the hash value of and is the size of the table.

Hashing by multiplication

The scheme in hashing by multiplication is as follows: [8] :2–3 Where is a non-integer real-valued constant and is the size of the table. An advantage of the hashing by multiplication is that the is not critical. [8] :2–3 Although any value produces a hash function, Donald Knuth suggests using the golden ratio. [8] :3

Choosing a hash function

Uniform distribution of the hash values is a fundamental requirement of a hash function. A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e.g., a Pearson's chi-squared test for discrete uniform distributions. [16] [17]

The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a power of two. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a prime number. [18]

For open addressing schemes, the hash function should also avoid clustering , the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior. [18] [4]

K-independent hashing offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable. A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing. Since K-independence can prove a hash function works, one can then focus on finding the fastest possible such hash function. [19]

Collision resolution

A search algorithm that uses hashing consists of two parts. The first part is computing a hash function which transforms the search key into an array index. The ideal case is such that no two search keys hashes to the same array index. However, this is not always the case and is impossible to guarantee for unseen given data. [20] :515 Hence the second part of the algorithm is collision resolution. The two common methods for collision resolution are separate chaining and open addressing. [6] :458

Separate chaining

Hash collision resolved by separate chaining Hash table 5 0 1 1 1 1 1 LL.svg
Hash collision resolved by separate chaining
Hash collision by separate chaining with head records in the bucket array. Hash table 5 0 1 1 1 1 0 LL.svg
Hash collision by separate chaining with head records in the bucket array.

In separate chaining, the process involves building a linked list with key–value pair for each search array index. The collided items are chained together through a single linked list, which can be traversed to access the item with a unique search key. [6] :464 Collision resolution through chaining with linked list is a common method of implementation of hash tables. Let and be the hash table and the node respectively, the operation involves as follows: [15] :258

Chained-Hash-Insert(T, k)   insertxat the head of linked listT[h(k)]  Chained-Hash-Search(T, k)   search for an element with keykin linked listT[h(k)]  Chained-Hash-Delete(T, k)   deletexfrom the linked listT[h(k)]

If the element is comparable either numerically or lexically, and inserted into the list by maintaining the total order, it results in faster termination of the unsuccessful searches. [20] :520–521

Other data structures for separate chaining

If the keys are ordered, it could be efficient to use "self-organizing" concepts such as using a self-balancing binary search tree, through which the theoretical worst case could be brought down to , although it introduces additional complexities. [20] :521

In dynamic perfect hashing, two-level hash tables are used to reduce the look-up complexity to be a guaranteed in the worst case. In this technique, the buckets of entries are organized as perfect hash tables with slots providing constant worst-case lookup time, and low amortized time for insertion. [21] A study shows array-based separate chaining to be 97% more performant when compared to the standard linked list method under heavy load. [22] :99

Techniques such as using fusion tree for each buckets also result in constant time for all operations with high probability. [23]

Caching and locality of reference

The linked list of separate chaining implementation may not be cache-conscious due to spatial localitylocality of reference—when the nodes of the linked list are scattered across memory, thus the list traversal during insert and search may entail CPU cache inefficiencies. [22] :91

In cache-conscious variants of collision resolution through separate chaining, a dynamic array found to be more cache-friendly is used in the place where a linked list or self-balancing binary search trees is usually deployed, since the contiguous allocation pattern of the array could be exploited by hardware-cache prefetchers—such as translation lookaside buffer—resulting in reduced access time and memory consumption. [24] [25] [26]

Open addressing

Hash collision resolved by open addressing with linear probing (interval=1). Note that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee", that had previously collided with "John Smith". Hash table 5 0 1 1 1 1 0 SP.svg
Hash collision resolved by open addressing with linear probing (interval=1). Note that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee", that had previously collided with "John Smith".
This graph compares the average number of CPU cache misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining and linear probing. Linear probing performs better due to better locality of reference, though as the table gets full, its performance degrades drastically. Hash table average insertion time.png
This graph compares the average number of CPU cache misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining and linear probing. Linear probing performs better due to better locality of reference, though as the table gets full, its performance degrades drastically.

Open addressing is another collision resolution technique in which every entry record is stored in the bucket array itself, and the hash resolution is performed through probing. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates an unsuccessful search. [27]

Well-known probe sequences include:

The performance of open addressing may be slower compared to separate chaining since the probe sequence increases when the load factor approaches 1. [9] [22] :93 The probing results in an infinite loop if the load factor reaches 1, in the case of a completely filled table. [6] :471 The average cost of linear probing depends on the hash function's ability to distribute the elements uniformly throughout the table to avoid clustering, since formation of clusters would result in increased search time. [6] :472

Caching and locality of reference

Since the slots are located in successive locations, linear probing could lead to better utilization of CPU cache due to locality of references resulting in reduced memory latency. [28]

Other collision resolution techniques based on open addressing

Coalesced hashing

Coalesced hashing is a hybrid of both separate chaining and open addressing in which the buckets or nodes link within the table. [30] :6–8 The algorithm is ideally suited for fixed memory allocation. [30] :4 The collision in coalesced hashing is resolved by identifying the largest-indexed empty slot on the hash table, then the colliding value is inserted into that slot. The bucket is also linked to the inserted node's slot which contains its colliding hash address. [30] :8

Cuckoo hashing

Cuckoo hashing is a form of open addressing collision resolution technique which guarantees worst-case lookup complexity and constant amortized time for insertions. The collision is resolved through maintaining two hash tables, each having its own hashing function, and collided slot gets replaced with the given item, and the preoccupied element of the slot gets displaced into the other hash table. The process continues until every key has its own spot in the empty buckets of the tables; if the procedure enters into infinite loop—which is identified through maintaining a threshold loop counter—both hash tables get rehashed with newer hash functions and the procedure continues. [31] :124–125

Hopscotch hashing

Hopscotch hashing is an open addressing based algorithm which combines the elements of cuckoo hashing, linear probing and chaining through the notion of a neighbourhood of buckets—the subsequent buckets around any given occupied bucket, also called a "virtual" bucket. [32] :351–352 The algorithm is designed to deliver better performance when the load factor of the hash table grows beyond 90%; it also provides high throughput in concurrent settings, thus well suited for implementing resizable concurrent hash table. [32] :350 The neighbourhood characteristic of hopscotch hashing guarantees a property that, the cost of finding the desired item from any given buckets within the neighbourhood is very close to the cost of finding it in the bucket itself; the algorithm attempts to be an item into its neighbourhood—with a possible cost involved in displacing other items. [32] :352

Each bucket within the hash table includes an additional "hop-information"—an H-bit bit array for indicating the relative distance of the item which was originally hashed into the current virtual bucket within H-1 entries. [32] :352 Let and be the key to be inserted and bucket to which the key is hashed into respectively; several cases are involved in the insertion procedure such that the neighbourhood property of the algorithm is vowed: [32] :352–353 if is empty, the element is inserted, and the leftmost bit of bitmap is set to 1; if not empty, linear probing is used for finding an empty slot in the table, the bitmap of the bucket gets updated followed by the insertion; if the empty slot is not within the range of the neighbourhood, i.e. H-1, subsequent swap and hop-info bit array manipulation of each bucket is performed in accordance with its neighbourhood invariant properties. [32] :353

Robin Hood hashing

Robin Hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into. [33] :12 Although Robin Hood hashing does not change the theoretical search cost, it significantly affects the variance of the distribution of the items on the buckets, [34] :2 i.e. dealing with cluster formation in the hash table. [35] Each node within the hash table that uses Robin Hood hashing should be augmented to store an extra PSL value. [36] Let be the key to be inserted, be the (incremental) PSL length of , be the hash table and be the index, the insertion procedure is as follows: [33] :12–13 [37] :5

  • If : the iteration goes into the next bucket without attempting an external probe.
  • If : insert the item into the bucket ; swap with —let it be ; continue the probe from the st bucket to insert ; repeat the procedure until every element is inserted.

Dynamic resizing

Repeated insertions cause the number of entries in a hash table to grow, which consequently increases the load factor; to maintain the amortized performance of the lookup and insertion operations, a hash table is dynamically resized and the items of the tables are rehashed into the buckets of the new hash table, [9] since the items cannot be copied over as varying table sizes results in different hash value due to modulo operation. [38] If a hash table becomes "too empty" after deleting some elements, resizing may be performed to avoid excessive memory usage. [39]

Resizing by moving all entries

Generally, a new hash table with a size double that of the original hash table gets allocated privately and every item in the original hash table gets moved to the newly allocated one by computing the hash values of the items followed by the insertion operation. Rehashing is simple, but computationally expensive. [40] :478–479

Alternatives to all-at-once rehashing

Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid dynamic resizing, a solution is to perform the resizing gradually to avoid storage blip—typically at 50% of new table's size—during rehashing and to avoid memory fragmentation that triggers heap compaction due to deallocation of large memory blocks caused by the old hash table. [41] :2–3 In such case, the rehashing operation is done incrementally through extending prior memory block allocated for the old hash table such that the buckets of the hash table remain unaltered. A common approach for amortized rehashing involves maintaining two hash functions and . The process of rehashing a bucket's items in accordance with the new hash function is termed as cleaning, which is implemented through command pattern by encapsulating the operations such as , and through a wrapper such that each element in the bucket gets rehashed and its procedure involve as follows: [41] :3

Linear hashing

Linear hashing is an implementation of the hash table which enables dynamic growths or shrinks of the table one bucket at a time. [42]

Performance

The performance of a hash table is dependent on the hash function's ability in generating quasi-random numbers () for entries in the hash table where , and denotes the key, number of buckets and the hash function such that . If the hash function generates the same for distinct keys (), this results in collision, which is dealt with in a variety of ways. The constant time complexity () of the operation in a hash table is presupposed on the condition that the hash function doesn't generate colliding indices; thus, the performance of the hash table is directly proportional to the chosen hash function's ability to disperse the indices. [43] :1 However, construction of such a hash function is practically infeasible, that being so, implementations depend on case-specific collision resolution techniques in achieving higher performance. [43] :2

Applications

Associative arrays

Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays. [29]

Database indexing

Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications. [44]

Caches

Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value. [45] [46]

Sets

Hash tables can be used in the implementation of set data structure, which can store unique values without any particular order; set is typically used in testing the membership of a value in the collection, rather than element retrieval. [47]

Transposition table

A transposition table to a complex Hash Table which stores information about each section that has been searched. [48]

Implementations

Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.

In JavaScript, an "object" is a mutable collection of key-value pairs (called "properties"), where each key is either a string or a guaranteed-unique "symbol"; any other value, when used as a key, is first coerced to a string. Aside from the seven "primitive" data types, every value in JavaScript is an object. [49] ECMAScript 2015 also added the Map data structure, which accepts arbitrary values as keys. [50]

C++11 includes unordered_map in its standard library for storing keys and values of arbitrary types. [51]

Go's built-in map implements a hash table in the form of a type. [52]

Java programming language includes the HashSet, HashMap, LinkedHashSet, and LinkedHashMap generic collections. [53]

Python's built-in dict implements a hash table in the form of a type. [54]

Ruby's built-in Hash uses the open addressing model from Ruby 2.4 onwards. [55]

Rust programming language includes HashMap, HashSet as part of the Rust Standard Library. [56]

The .NET standard library includes HashSet and Dictionary, [57] [58] so it can be used from languages such as C# and VB.NET. [59]

See also

Related Research Articles

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter-storage addressing.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

<span class="mw-page-title-main">Perfect hash function</span> Hash function without any collisions

In computer science, a perfect hash functionh for a set S is a hash function that maps distinct elements in S to a set of m integers, with no collisions. In mathematical terms, it is an injective function.

In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation, in a process termed as direct addressing. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. The tables may be precalculated and stored in static program storage, calculated as part of a program's initialization phase (memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid items in an array and, in some programming languages, may include pointer functions to process the matching input. FPGAs also make extensive use of reconfigurable, hardware-implemented, lookup tables to provide programmable hardware functionality. LUTs differ from hash tables in a way that, to retrieve a value with key , a hash table would store the value in the slot where is a hash function i.e. is used to compute the slot, while in the case of LUT, the value is stored in slot , thus directly addressable.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs. Double hashing with open addressing is a classical data structure on a table .

<span class="mw-page-title-main">Open addressing</span> Hash collision resolution technique

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. Well-known probe sequences include:

<span class="mw-page-title-main">Linear probing</span> Computer programming method for hashing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash tables. Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found.

<span class="mw-page-title-main">Coalesced hashing</span> Hash table collision resolution strategy

Coalesced hashing, also called coalesced chaining, is a strategy of collision resolution in a hash table that forms a hybrid of separate chaining and open addressing.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

<span class="mw-page-title-main">Cuckoo hashing</span> Data structure hashing scheme

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches in a variation of the behavior referred to as brood parasitism; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.

In mathematics and computing, universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary. Many universal families are known, and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations of hash tables, randomized algorithms, and cryptography.

2-choice hashing, also known as 2-choice chaining, is "a variant of a hash table in which keys are added by hashing with two hash functions. The key is put in the array position with the fewer (colliding) keys. Some collision resolution scheme is needed, unless keys are kept in buckets. The average-case cost of a successful search is , where is the number of keys and is the size of the array. The most collisions is with high probability."

Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation. This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.

In computer science, SUHA is a basic assumption that facilitates the mathematical analysis of hash tables. The assumption states that a hypothetical hashing function will evenly distribute items into the slots of a hash table. Moreover, each item to be hashed has an equal probability of being placed into a slot, regardless of the other elements already placed. This assumption generalizes the details of the hash function and allows for certain assumptions about the stochastic system.

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

<span class="mw-page-title-main">Hopscotch hashing</span>

Hopscotch hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table using open addressing. It is also well suited for implementing a concurrent hash table. Hopscotch hashing was introduced by Maurice Herlihy, Nir Shavit and Moran Tzafrir in 2008. The name is derived from the sequence of hops that characterize the table's insertion algorithm.

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables. Such families allow good average case performance in randomized algorithms or data structures, even if the input data is chosen by an adversary. The trade-offs between the degree of independence and the efficiency of evaluating the hash function are well studied, and many k-independent families have been proposed.

In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. It was first studied in the form of Zobrist hashing for computer games; later work by Carter and Wegman extended this method to arbitrary fixed-length keys. Generalizations of tabulation hashing have also been developed that can handle variable-length keys such as text strings.

References

  1. 1 2 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms (3rd ed.). Massachusetts Institute of Technology. pp. 253–280. ISBN   978-0-262-03384-8.
  2. Mehlhorn, Kurt; Sanders, Peter (2008). "Hash Tables and Associative Arrays" (PDF). Algorithms and Data Structures. Springer. pp. 81–98. doi:10.1007/978-3-540-77978-0_4. ISBN   978-3-540-77977-3.
  3. Leiserson, Charles E. (Fall 2005). "Lecture 13: Amortized Algorithms, Table Doubling, Potential Method". course MIT 6.046J/18.410J Introduction to Algorithms. Archived from the original on August 7, 2009.
  4. 1 2 3 Knuth, Donald (1998). The Art of Computer Programming. Vol. 3: Sorting and Searching (2nd ed.). Addison-Wesley. pp. 513–558. ISBN   978-0-201-89685-5.
  5. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp.  221–252. ISBN   978-0-262-53196-2.
  6. 1 2 3 4 5 Sedgewick, Robert; Wayne, Kevin (2011). Algorithms. Vol. 1 (4 ed.). Addison-Wesley Professional via Princeton University, Department of Computer Science.
  7. 1 2 3 Konheim, Alan G. (2010). Hashing in Computer Science. doi:10.1002/9780470630617. ISBN   978-0-470-34473-6.
  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 Mehta, Dinesh P.; Mehta, Dinesh P.; Sahni, Sartaj, eds. (2004). Handbook of Data Structures and Applications. doi:10.1201/9781420035179. ISBN   978-0-429-14701-2.
  9. 1 2 3 4 5 6 7 Mayers, Andrew (2008). "CS 312: Hash tables and amortized analysis". Cornell University, Department of Computer Science. Archived from the original on April 26, 2021. Retrieved October 26, 2021 via cs.cornell.edu.
  10. 1 2 James S. Plank and Brad Vander Zanden. "CS140 Lecture notes -- Hashing".
  11. Maurer, W. D.; Lewis, T. G. (March 1975). "Hash Table Methods". ACM Computing Surveys. 7 (1): 5–19. doi:10.1145/356643.356645. S2CID   17874775.
  12. 1 2 Owolabi, Olumide (February 2003). "Empirical studies of some hashing functions". Information and Software Technology. 45 (2): 109–112. doi:10.1016/S0950-5849(02)00174-X.
  13. 1 2 Lu, Yi; Prabhakar, Balaji; Bonomi, Flavio (2006). Perfect Hashing for Network Applications. 2006 IEEE International Symposium on Information Theory. pp. 2774–2778. doi:10.1109/ISIT.2006.261567. ISBN   1-4244-0505-X. S2CID   1494710.
  14. Belazzougui, Djamal; Botelho, Fabiano C.; Dietzfelbinger, Martin (2009). "Hash, displace, and compress" (PDF). Algorithms—ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7–9, 2009, Proceedings. Lecture Notes in Computer Science. Vol. 5757. Berlin: Springer. pp. 682–693. CiteSeerX   10.1.1.568.130 . doi:10.1007/978-3-642-04128-0_61. MR   2557794.
  15. 1 2 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). Massachusetts Institute of Technology. ISBN   978-0-262-53196-2.
  16. Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
  17. Plackett, Robin (1983). "Karl Pearson and the Chi-Squared Test". International Statistical Review. 51 (1): 59–72. doi:10.2307/1402731. JSTOR   1402731.
  18. 1 2 Wang, Thomas (March 1997). "Prime Double Hash Table". Archived from the original on September 3, 1999. Retrieved May 10, 2015.
  19. Wegman, Mark N.; Carter, J.Lawrence (June 1981). "New hash functions and their use in authentication and set equality". Journal of Computer and System Sciences. 22 (3): 265–279. doi: 10.1016/0022-0000(81)90033-7 .
  20. 1 2 3 Donald E. Knuth (April 24, 1998). The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley Professional. ISBN   978-0-201-89685-5.
  21. Demaine, Erik; Lind, Jeff (Spring 2003). "Lecture 2" (PDF). 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Archived (PDF) from the original on June 15, 2010. Retrieved June 30, 2008.
  22. 1 2 3 Culpepper, J. Shane; Moffat, Alistair (2005). "Enhanced Byte Codes with Restricted Prefix Properties". String Processing and Information Retrieval. Lecture Notes in Computer Science. Vol. 3772. pp. 1–12. doi:10.1007/11575832_1. ISBN   978-3-540-29740-6.
  23. Willard, Dan E. (2000). "Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree". SIAM Journal on Computing . 29 (3): 1030–1049. doi:10.1137/S0097539797322425. MR   1740562..
  24. Askitis, Nikolas; Sinha, Ranjan (October 2010). "Engineering scalable, cache and space efficient tries for strings". The VLDB Journal. 19 (5): 633–660. doi:10.1007/s00778-010-0183-9.
  25. Askitis, Nikolas; Zobel, Justin (October 2005). "Cache-conscious Collision Resolution in String Hash Tables". Proceedings of the 12th International Conference, String Processing and Information Retrieval (SPIRE 2005). Vol. 3772/2005. pp. 91–102. doi:10.1007/11575832_11. ISBN   978-3-540-29740-6.
  26. Askitis, Nikolas (2009). "Fast and Compact Hash Tables for Integer Keys" (PDF). Proceedings of the 32nd Australasian Computer Science Conference (ACSC 2009). Vol. 91. pp. 113–122. ISBN   978-1-920682-72-9. Archived from the original (PDF) on February 16, 2011. Retrieved June 13, 2010.
  27. Tenenbaum, Aaron M.; Langsam, Yedidyah; Augenstein, Moshe J. (1990). Data Structures Using C. Prentice Hall. pp. 456–461, p. 472. ISBN   978-0-13-199746-2.
  28. 1 2 Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. pp. 121–133. CiteSeerX   10.1.1.25.4189 . doi:10.1007/3-540-44676-1_10. ISBN   978-3-540-42493-2.
  29. 1 2 3 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "11 Hash Tables", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 221–252, ISBN   0-262-03293-7 .
  30. 1 2 3 Vitter, Jeffery S.; Chen, Wen-Chin (1987). The design and analysis of coalesced hashing . New York, United States: Oxford University Press. ISBN   978-0-19-504182-8 via Archive.org.
  31. Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. pp. 121–133. CiteSeerX   10.1.1.25.4189 . doi:10.1007/3-540-44676-1_10. ISBN   978-3-540-42493-2.
  32. 1 2 3 4 5 6 Herlihy, Maurice; Shavit, Nir; Tzafrir, Moran (2008). "Hopscotch Hashing". Distributed Computing. Lecture Notes in Computer Science. Vol. 5218. pp. 350–364. doi:10.1007/978-3-540-87779-0_24. ISBN   978-3-540-87778-3.
  33. 1 2 Celis, Pedro (1986). Robin Hood Hashing (PDF). Ontario, Canada: University of Waterloo, Dept. of Computer Science. ISBN   978-0-315-29700-5. OCLC   14083698. Archived (PDF) from the original on November 1, 2021. Retrieved November 2, 2021.
  34. Poblete, P. V.; Viola, A. (July 2019). "Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions". Combinatorics, Probability and Computing. 28 (4): 600–617. doi:10.1017/S0963548318000408. S2CID   125374363.
  35. Clarkson, Michael (2014). "Lecture 13: Hash tables". Cornell University, Department of Computer Science. Archived from the original on October 7, 2021. Retrieved November 1, 2021 via cs.cornell.edu.
  36. Gries, David (2017). "JavaHyperText and Data Structure: Robin Hood Hashing" (PDF). Cornell University, Department of Computer Science. Archived (PDF) from the original on April 26, 2021. Retrieved November 2, 2021 via cs.cornell.edu.
  37. Celis, Pedro (March 28, 1988). External Robin Hood Hashing (PDF) (Technical report). Bloomington, Indiana: Indiana University, Department of Computer Science. 246. Archived (PDF) from the original on November 3, 2021. Retrieved November 2, 2021.
  38. Goddard, Wayne (2021). "Chapter C5: Hash Tables" (PDF). Clemson University. pp. 15–16. Retrieved December 4, 2023.
  39. Devadas, Srini; Demaine, Erik (February 25, 2011). "Intro to Algorithms: Resizing Hash Tables" (PDF). Massachusetts Institute of Technology, Department of Computer Science. Archived (PDF) from the original on May 7, 2021. Retrieved November 9, 2021 via MIT OpenCourseWare.
  40. Thareja, Reema (2014). "Hashing and Collision". Data Structures Using C. Oxford University Press. pp. 464–488. ISBN   978-0-19-809930-7.
  41. 1 2 Friedman, Scott; Krishnan, Anand; Leidefrost, Nicholas (March 18, 2003). "Hash Tables for Embedded and Real-time systems" (PDF). All Computer Science and Engineering Research. Washington University in St. Louis. doi:10.7936/K7WD3XXV. Archived (PDF) from the original on June 9, 2021. Retrieved November 9, 2021 via Northwestern University, Department of Computer Science.
  42. Litwin, Witold (1980). "Linear hashing: A new tool for file and table addressing" (PDF). Proc. 6th Conference on Very Large Databases. Carnegie Mellon University. pp. 212–223. Archived (PDF) from the original on May 6, 2021. Retrieved November 10, 2021 via cs.cmu.edu.
  43. 1 2 Dijk, Tom Van (2010). "Analysing and Improving Hash Table Performance" (PDF). Netherlands: University of Twente. Archived (PDF) from the original on November 6, 2021. Retrieved December 31, 2021.
  44. Lech Banachowski. "Indexes and external sorting". pl:Polsko-Japońska Akademia Technik Komputerowych. Archived from the original on March 26, 2022. Retrieved March 26, 2022.
  45. Zhong, Liang; Zheng, Xueqian; Liu, Yong; Wang, Mengting; Cao, Yang (February 2020). "Cache hit ratio maximization in device-to-device communications overlaying cellular networks". China Communications. 17 (2): 232–238. doi:10.23919/jcc.2020.02.018. S2CID   212649328.
  46. Bottommley, James (January 1, 2004). "Understanding Caching". Linux Journal. Archived from the original on December 4, 2020. Retrieved April 16, 2022.
  47. Jill Seaman (2014). "Set & Hash Tables" (PDF). Texas State University. Archived from the original on April 1, 2022. Retrieved March 26, 2022.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  48. "Transposition Table - Chessprogramming wiki". chessprogramming.org. Archived from the original on February 14, 2021. Retrieved May 1, 2020.
  49. "JavaScript data types and data structures - JavaScript | MDN". developer.mozilla.org. Retrieved July 24, 2022.
  50. "Map - JavaScript | MDN". developer.mozilla.org. June 20, 2023. Retrieved July 15, 2023.
  51. "Programming language C++ - Technical Specification" (PDF). International Organization for Standardization. pp. 812–813. Archived from the original (PDF) on January 21, 2022. Retrieved February 8, 2022.
  52. "The Go Programming Language Specification". go.dev. Retrieved January 1, 2023.
  53. "Lesson: Implementations (The Java™ Tutorials > Collections)". docs.oracle.com. Archived from the original on January 18, 2017. Retrieved April 27, 2018.
  54. Zhang, Juan; Jia, Yunwei (2020). "Redis rehash optimization based on machine learning". Journal of Physics: Conference Series . 1453 (1): 3. Bibcode:2020JPhCS1453a2048Z. doi: 10.1088/1742-6596/1453/1/012048 . S2CID   215943738.
  55. Jonan Scheffler (December 25, 2016). "Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding". heroku.com. Archived from the original on July 3, 2019. Retrieved July 3, 2019.
  56. "doc.rust-lang.org". Archived from the original on December 8, 2022. Retrieved December 14, 2022.
  57. "HashSet Class (System.Collections.Generic)". learn.microsoft.com. Retrieved July 1, 2023.
  58. dotnet-bot. "Dictionary Class (System.Collections.Generic)". learn.microsoft.com. Retrieved January 16, 2024.
  59. "VB.NET HashSet Example". Dot Net Perls.

Further reading