Concurrent hash table

Last updated November 28, 2024

A concurrent hash table or concurrent hash map is an implementation of hash tables allowing concurrent access by multiple threads using a hash function.^[1]^[2]

Due to the natural problems associated with concurrent access - namely contention - the way and scope in which the table can be concurrently accessed differs depending on the implementation. Furthermore, the resulting speed up might not be linear with the amount of threads used as contention needs to be resolved, producing processing overhead.^[1] There exist multiple solutions to mitigate the effects of contention, that each preserve the correctness of operations on the table.^[1]^[2]^[3]^[4]

As with their sequential counterpart, concurrent hash tables can be generalized and extended to fit broader applications, such as allowing more complex data types to be used for keys and values. These generalizations can however negatively impact performance and should thus be chosen in accordance to the requirements of the application.^[1]

Concurrent hashing

When creating concurrent hash tables, the functions accessing the table with the chosen hashing algorithm need to be adapted for concurrency by adding a conflict resolution strategy. Such a strategy requires managing accesses in a way such that conflicts caused by them do not result in corrupt data, while ideally increasing their efficiency when used in parallel. Herlihy and Shavit^[5] describe how the accesses to a hash table without such a strategy - in its example based on a basic implementation of the Cuckoo hashing algorithm - can be adapted for concurrent use. Fan et al.^[6] further describe a table access scheme based on cuckoo hashing that is not only concurrent, but also keeps the space efficiency of its hashing function while also improving cache locality as well as the throughput of insertions.

When hash tables are not bound in size and are thus allowed to grow/shrink when necessary, the hashing algorithm needs to be adapted to allow this operation. This entails modifying the used hash function to reflect the new key-space of the resized table. A concurrent growing algorithm is described by Maier et al.^[1]

Mega-KV^[7] is a high performance key-value store system, where the cuckoo hashing is used and the KV indexing is massively parallelized in batch mode by GPU. With further optimizations of GPU acceleration by Nvidia and Oak Ridge National Lab, Mega-KV was pushed to another high record of the throughput in 2018 (up to 888 millions of key-value operations per second).^[8]

Contention handling

As with any concurrent data structure, concurrent hash tables suffer from a variety of problems known in the field of concurrent computing as a result of contention.^[3] Examples for such are the ABA problem, race conditions, and deadlocks. The extent in which these problems manifest or even occur at all depends on the implementation of the concurrent hash table; specifically which operations the table allows to be run concurrently, as well as its strategies for mitigating problems associated with contention. When handling contention, the main goal is the same as with any other concurrent data structure, namely ensuring correctness for every operation on the table. At the same time, it should naturally be done in such a way as to be more efficient than a sequential solution when used concurrently. This is also known as concurrency control.

Atomic instructions

Using atomic instructions such as compare-and-swap or fetch-and-add, problems caused by contention can be reduced by ensuring that an access is completed before another access has the chance to interfere. Operations such as compare-and-swap often present limitations as to what size of data they can handle, meaning that the types of keys and values of a table have to be chosen or converted accordingly.^[1]

Using so called Hardware Transactional Memory (HTM), table operations can be thought of much like database transactions,^[3] ensuring atomicity. An example of HTM in practice are the Transactional Synchronization Extensions.

Locking

With the help of locks, operations trying to concurrently access the table or values within it can be handled in a way that ensures correct behavior. This can however lead to negative performance impacts,^[1]^[6] in particular when the locks used are too restrictive, thus blocking accesses that would otherwise not contend and could execute without causing any problems. Further considerations have to be made to avoid even more critical problems that threaten correctness, as with livelocks, deadlocks or starvation.^[3]

Phase concurrency

A phase concurrent hash table groups accesses by creating phases in which only one type of operation is allowed (i.e. a pure write-phase), followed by a synchronization of the table state across all threads. A formally proven algorithm for this is given by Shun and Blelloch.^[2]

Read-copy-update

Widely used within the Linux kernel,^[3] read-copy-update (RCU) is especially useful in cases where the number of reads far exceeds the number of writes.^[4]

Applications

Naturally, concurrent hash tables find application wherever sequential hash tables are useful. The advantage that concurrency delivers herein lies within the potential speedup of these use-cases, as well as the increased scalability.^[1] Considering hardware such as multi-core processors that become increasingly more capable of concurrent computation, the importance of concurrent data structures within these applications grow steadily.^[3]

Performance analysis

Maier et al.^[1] perform a thorough analysis on a variety of concurrent hash table implementations, giving insight into the effectiveness of each in different situations that are likely to occur in real use-cases. The most important findings can be summed up as the following:

Operation	Contention		Notes
Operation	Low	High	Notes
`find`			Very high speedups both when successful and unsuccessful unique finds, even with very high contention
`insert`			High speedups reached, high contention becomes problematic when keys can hold more than one value (otherwise inserts are simply discarded if key already exists)
`update`			Both overwrites and modifications of existing values reach high speedups when contention is kept low, otherwise performs worse than sequential
`delete`			Phase concurrency reached highest scalability; Fully concurrent implementations where `delete` uses `update` with dummy-elements were closely behind

As expected low contention leads to positive behavior across every operation, whereas high contention becomes problematic when it comes to writing. The latter however is a problem of high contention in general, wherein the benefit of concurrent computation is negated due to the natural requirement for concurrency control restricting contending accesses. The resulting overhead causes worse performance than that of the ideal sequential version. In spite of this, concurrent hash tables still prove invaluable even in such high contention scenarios when observing that a well-designed implementation can still achieve very high speedups by leveraging the benefits of concurrency to read data concurrently.

However, real use-cases of concurrent hash tables are often not simply sequences of the same operation, but rather a mixture of multiple types. As such, when a mixture of insert and find operations is used the speedup and resulting usefulness of concurrent hash tables become more obvious, especially when observing find heavy workloads.

Ultimately the resulting performance of a concurrent hash table depends on a variety of factors based upon its desired application. When choosing the implementation, it is important to determine the necessary amount of generality, contention handling strategies and some thoughts on whether the size of the desired table can be determined in advance or a growing approach must be used instead.

Implementations

.NET have ConcurrentDictionary^[9],
Since Java 1.5, concurrent hash maps are provided based upon concurrent map interface.^[10]
libcuckoo provides concurrent hash tables for C/C++ allowing concurrent reads and writes. The library is available on GitHub.^[11]
Threading Building Blocks provide concurrent unordered maps for C++ which allow concurrent insertion and traversal and are kept in a similar style to the C++11 std::unordered_map interface. Included within are the concurrent unordered multimaps, which allow multiple values to exist for the same key in a concurrent unordered map.^[12] Additionally, concurrent hash maps are provided which build upon the concurrent unordered map and further allow concurrent erasure and contain built-in locking.^[13]
growt provides concurrent growing hash tables for C++ on the basis of the so-called folklore implementation. Based on this non-growing implementation, a variety of different growing hash tables is given. These implementations allow for concurrent reads, inserts, updates (notably updating values based on the current value at the key) and removals (based upon updating using tombstones). Beyond that, variants on the basis of Intel TSX are provided. The library is available on GitHub.^[1]^[14]
folly provides concurrent hash tables^[15] for C++14 and later ensuring wait-free readers and lock-based, sharded writers. As stated on its GitHub page, this library provides useful functionality for Facebook.^[16]
Junction provides several implementations of concurrent hash tables for C++ on the basis of atomic operations to ensure thread-safety for any given member function of the table. The library is available on GitHub.^[17]

Related Research Articles

In computer science, a data structure is a data organization and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter-storage addressing.

In computing, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

<span class="mw-page-title-main">Flyweight pattern</span> Software design pattern for objects

In computer programming, the flyweight software design pattern refers to an object that minimizes memory usage by sharing some of its data with other similar objects. The flyweight pattern is one of twenty-three well-known GoF design patterns. These patterns promote flexible object-oriented software design, which is easier to implement, change, test, and reuse.

In computer science, a perfect hash function $h$ for a set $S$ is a hash function that maps distinct elements in $S$ to a set of $m$ integers, with no collisions. In mathematical terms, it is an injective function.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a parallel random-access machine is a shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machine (RAM). In the same way that the RAM is used by sequential-algorithm designers to model algorithmic performance, the PRAM is used by parallel-algorithm designers to model parallel algorithmic performance. Similar to the way in which the RAM model neglects practical issues, such as access time to cache memory versus main memory, the PRAM model neglects such issues as synchronization and communication, but provides any (problem-size-dependent) number of processors. Algorithm cost, for instance, is estimated using two parameters O(time) and O(time × processor_number).

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. STM is a strategy implemented in software, rather than as a hardware component. A transaction in this context occurs when a piece of code executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instant in time; intermediate states are not visible to other (successful) transactions. The idea of providing hardware support for transactions originated in a 1986 paper by Tom Knight. The idea was popularized by Maurice Herlihy and J. Eliot B. Moss. In 1995, Nir Shavit and Dan Touitou extended this idea to software-only transactional memory (STM). Since 2005, STM has been the focus of intense research and support for practical implementations is growing.

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches in a variation of the behavior referred to as brood parasitism; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.

oneAPI Threading Building Blocks is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

A hash array mapped trie (HAMT) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie. It is a refined version of the more general notion of a hash tree.

bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.

In computer science, a concurrent data structure is a data structure designed for access and modification by multiple computing threads on a computer, for example concurrent queues, concurrent stacks etc. The concurrent data structure is typically considered to reside in an abstract storage environment known as shared memory, which may be physically implemented as either a tightly coupled or a distributed collection of storage modules.

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables. Such families allow good average case performance in randomized algorithms or data structures, even if the input data is chosen by an adversary. The trade-offs between the degree of independence and the efficiency of evaluating the hash function are well studied, and many k-independent families have been proposed.

A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots.

SipHash is an add–rotate–xor (ARX) based family of pseudorandom functions created by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012, in response to a spate of "hash flooding" denial-of-service attacks (HashDoS) in late 2011.

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.concurrent.ConcurrentMap interface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap, and the java.util.concurrent.ConcurrentNavigableMap interface was added as a subinterface combination.

References

1 2 3 4 5 6 7 8 9 10 11 Maier, Tobias; Sanders, Peter; Dementiev, Roman (March 2019). "Concurrent Hash Tables: Fast and General(?)!". ACM Transactions on Parallel Computing. 5 (4). New York, NY, USA: ACM. Article 16. doi:10.1145/3309206. ISSN 2329-4949. S2CID 67870641.
1 2 3 Shun, Julian; Blelloch, Guy E. (2014). "Phase-concurrent Hash Tables for Determinism". SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. New York: ACM. pp. 96–107. doi:10.1145/2612669.2612687. ISBN 978-1-4503-2821-0.
1 2 3 4 5 6 Li, Xiaozhou; Andersen, David G.; Kaminsky, Michael; Freedman, Michael J. (2014). "Algorithmic Improvements for Fast Concurrent Cuckoo Hashing". Proceedings of the Ninth European Conference on Computer Systems. EuroSys '14. New York: ACM. Article No. 27. doi:10.1145/2592798.2592820. ISBN 978-1-4503-2704-6.
1 2 Triplett, Josh; McKenney, Paul E.; Walpole, Jonathan (2011). "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming". USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference. Berkeley, CA: USENIX Association. p. 11.
↑ Herlihy, Maurice; Shavit, Nir (2008). "Chapter 13: Concurrent Hashing and Natural Parallelism". The Art of Multiprocessor Programming. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. pp. 316–325. ISBN 978-0-12-370591-4.
1 2 Fan, Bin; Andersen, David G.; Kaminsky, Michael (2013). "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing". nsdi'13: Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association. pp. 371–384.
↑ Zhang, Kai; Wang, Kaibo; Yuan, Yuan; Guo, Lei; Lee, Rubao; and Zhang, Xiaodong (2015). "Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores". Proceedings of the VLDB Endowment, Vol. 8, No. 11, 2015.
↑ Chu, Ching-Hsing; Potluri, Sreeram; Goswami, Anshuman; Venkata, Manjunath Gorentla; Imam, Neenaand; and Newburn, Chris J. (2018) "Designing High-performance in-memory key-value operations with persistent GPU kernels and OPENSHMEM"..
↑ "ConcurrentDictionary Class (System.Collections.Concurrent)". learn.microsoft.com. Retrieved 26 November 2024.
↑ Java ConcurrentHashMap documentation
↑ GitHub repository for libcuckoo
↑ Threading Building Blocks concurrent_unordered_map and concurrent_unordered_multimap documentation
↑ Threading Building Blocks concurrent_hash_map documentation
↑ GitHub repository for growt
↑ GitHub page for implementation of concurrent hash maps in folly
↑ GitHub repository for folly
↑ GitHub repository for Junction

v t e Concurrent computing
General	Concurrency Concurrency control Concurrent data structures Concurrent hash tables Concurrent users Indeterminacy Linearizability
Process calculi	CSP CCS ACP LOTOS π-calculus Ambient calculus API-Calculus PEPA Join-calculus
Classic problems	ABA problem Cigarette smokers problem Deadlock Dining philosophers problem Producer–consumer problem Race condition Readers–writers problem Sleeping barber problem
Category: Concurrent computing