Consistent hashing

Last updated

In computer science, consistent hashing [1] [2] is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Contents

History

The term "consistent hashing" was introduced by David Karger et al. at MIT for use in distributed caching, particularly for the web. [3] This academic paper from 1997 in Symposium on Theory of Computing introduced the term "consistent hashing" as a way of distributing requests among a changing population of web servers. [4] Each slot is then represented by a server in a distributed system or cluster. The addition of a server and the removal of a server (during scalability or outage) requires only items to be re-shuffled when the number of slots (i.e. servers) change. The authors mention linear hashing and its ability to handle sequential server addition and removal, while consistent hashing allows servers to be added and removed in an arbitrary order. [1] The paper was later re-purposed to address technical challenge of keeping track of a file in peer-to-peer networks such as a distributed hash table. [5] [6]

Teradata used this technique in their distributed database, released in 1986, although they did not use this term. Teradata still uses the concept of a hash table to fulfill exactly this purpose. Akamai Technologies was founded in 1998 by the scientists Daniel Lewin and F. Thomson Leighton (co-authors of the article coining "consistent hashing"). In Akamai's content delivery network, [7] consistent hashing is used to balance the load within a cluster of servers, while a stable marriage algorithm is used to balance load across clusters. [2]

Consistent hashing has also been used to reduce the impact of partial system failures in large web applications to provide robust caching without incurring the system-wide fallout of a failure. [8] Consistent hashing is also the cornerstone of distributed hash tables (DHTs), which employ hash values to partition a keyspace across a distributed set of nodes, then construct an overlay network of connected nodes that provide efficient node retrieval by key.

Rendezvous hashing, designed in 1996, is a simpler and more general technique [ citation needed ]. It achieves the goals of consistent hashing using the very different highest random weight (HRW) algorithm.

Basic technique

In this case, using consistent hashing would result in the "BLOB" getting stored server 139. A BLOB is mapped to the next server that appears on the circle in clockwise order until it reaches a server which is
z
<=
server ID
{\displaystyle \zeta \leq {\text{server ID}}} Consistent Hashing Sample Illustration.png
In this case, using consistent hashing would result in the "BLOB" getting stored server 139. A BLOB is mapped to the next server that appears on the circle in clockwise order until it reaches a server which is

In the problem of load balancing, for example, when a BLOB has to be assigned to one of servers on a cluster, a standard hash function could be used in such a way that we calculate the hash value for that BLOB, assuming the resultant value of the hash is , we perform modular operation with the number of servers ( in this case) to determine the server in which we can place the BLOB: ; hence the BLOB will be placed in the server whose is successor of in this case. However, when a server is added or removed during outage or scaling (when changes), all the BLOBs in every server should be reassigned and moved due to rehashing, but this operation is expensive.

Consistent hashing was designed to avoid the problem of having to reassign every BLOB when a server is added or removed throughout the cluster. The central idea is to use a hash function that maps both the BLOB and servers to a unit circle, usually radians. For example, (where is hash of a BLOB or server's identifier, like IP address or UUID). Each BLOB is then assigned to the next server that appears on the circle in clockwise order. Usually, binary search algorithm or linear search is used to find a "spot" or server to place that particular BLOB in or complexities respectively; and in every iteration, which happens in clockwise manner, an operation (where is the value of the server within the cluster) is performed to find the server to place the BLOB. This provides an even distribution of BLOBs to servers. But, more importantly, if a server fails and is removed from the circle, only the BLOBs that were mapped to the failed server need to be reassigned to the next server in clockwise order. Likewise, if a new server is added, it is added to the unit circle, and only the BLOBs mapped to that server need to be reassigned.

Importantly, when a server is added or removed, the vast majority of the BLOBs maintain their prior server assignments, and the addition of server only causes fraction of the BLOBs to relocate. Although the process of moving BLOBs across cache servers in the cluster depends on the context, commonly, the newly added cache server identifies its "successor" and moves all the BLOBs, whose mapping belongs to this server (i.e. whose hash value is less than that of the new server), from it. However, in the case of web page caches, in most implementations there is no involvement of moving or copying, assuming the cached BLOB is small enough. When a request hits a newly added cache server, a cache miss happens and a request to the actual web server is made and the BLOB is cached locally for future requests. The redundant BLOBs on the previously used cache servers would be removed as per the cache eviction policies. [9]

Implementation

Let and be the hash functions used for the BLOB and server's unique identifier respectively. In practice, a binary search tree (BST) is used to dynamically maintain the within a cluster or hashring, and to find the successor or minimum within the BST, tree traversal is used.

Inserting into the cluster
Let be the hash value of a BLOB such that, where and . To insert , find the successor of in the BST of s. If is larger than all of the s, the BLOB is placed in the server with smallest value.
Deleting from the cluster
Find the successor of in the BST, remove the BLOB from the returned . If has no successor, remove the BLOB from the smallest of the s. [10]
Insert a server into cluster
Let be the hash value of a server's identifier such that, where and . Move all the BLOBs, whose hash value is smaller than , from the server whose is successor of . If is largest of all the s, move the relevant BLOBs from the smallest of the s into . [11]
Delete a server from cluster
Find the successor of in the BST, move the BLOBs from into its successor server. If doesn't have a successor, move the BLOBs into the smallest of the s. [12]

Variance reduction

To avoid skewness of multiple nodes within the radian, which happen due to lack of uniform distribution of the servers within the cluster, multiple labels are used. Those duplicate labels are called "virtual nodes" i.e. multiple labels which point to a single "real" label or server within the cluster. The amount of virtual nodes or duplicate labels used for a particular server within a cluster is called the "weight" of that particular server. [13]

Practical extensions

A number of extensions to the basic technique are needed for effectively using consistent hashing for load balancing in practice. In the basic scheme above, if a server fails, all its BLOBs are reassigned to the next server in clockwise order, potentially doubling the load of that server. This may not be desirable. To ensure a more even redistribution of BLOBs on server failure, each server can be hashed to multiple locations on the unit circle. When a server fails, the BLOBs assigned to each of its replicas on the unit circle will get reassigned to a different server in clockwise order, thus redistributing the BLOBs more evenly. Another extension concerns a situation where a single BLOB gets "hot" and is accessed a large number of times and will have to be hosted in multiple servers. In this situation, the BLOB may be assigned to multiple contiguous servers by traversing the unit circle in clockwise order. A more complex practical consideration arises when two BLOBs are hashed near each other in the unit circle and both get "hot" at the same time. In this case, both BLOBs will use the same set of contiguous servers in the unit circle. This situation can be ameliorated by each BLOB choosing a different hash function for mapping servers to the unit circle. [2]

Comparison with rendezvous hashing and other alternatives

Rendezvous hashing, designed in 1996, is a simpler and more general technique, and permits fully distributed agreement on a set of options out of a possible set of options. It can in fact be shown that consistent hashing is a special case of rendezvous hashing. Because of its simplicity and generality, rendezvous hashing is now being used in place of Consistent Hashing in many applications.

If key values will always increase monotonically, an alternative approach using a hash table with monotonic keys may be more suitable than consistent hashing.[ citation needed ]

Complexity

Asymptotic time complexities for nodes (or slots) and keys
Classic hash tableConsistent hashing
add a node
remove a node
lookup a key
add a key
remove a key

The is an average cost for redistribution of keys and the complexity for consistent hashing comes from the fact that a binary search among nodes angles is required to find the next node on the ring.[ citation needed ]

Examples

Known examples of consistent hashing use include:

Related Research Articles

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as a hash map or a hash set, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

<span class="mw-page-title-main">Suffix tree</span> Tree containing all suffixes of a given text

In computer science, a suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

In mathematics, and specifically in potential theory, the Poisson kernel is an integral kernel, used for solving the two-dimensional Laplace equation, given Dirichlet boundary conditions on the unit disk. The kernel can be understood as the derivative of the Green's function for the Laplace equation. It is named for Siméon Poisson.

In computer programming, primary clustering is a phenomenon that causes performance degradation in linear-probing hash tables. The phenomenon states that, as elements are added to a linear probing hash table, they have a tendency to cluster together into long runs. If the hash table is at a load factor of for some parameter , then the expected length of the run containing a given element is . This causes insertions and negative queries to take expected time in a linear-probing hash table.

2-choice hashing, also known as 2-choice chaining, is "a variant of a hash table in which keys are added by hashing with two hash functions. The key is put in the array position with the fewer (colliding) keys. Some collision resolution scheme is needed, unless keys are kept in buckets. The average-case cost of a successful search is , where is the number of keys and is the size of the array. The most collisions is with high probability."

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

<span class="mw-page-title-main">Toroidal and poloidal coordinates</span> Coordinate system relative to a torus

The terms toroidal and poloidal refer to directions relative to a torus of reference. They describe a three-dimensional coordinate system in which the poloidal direction follows a small circular ring around the surface, while the toroidal direction follows a large circular ring around the torus, encircling the central void.

In peer-to-peer networks, Koorde is a distributed hash table (DHT) system based on the Chord DHT and the De Bruijn graph. Inheriting the simplicity of Chord, Koorde meets O(log n) hops per node, and hops per lookup request with O(log n) neighbors per node.

<span class="mw-page-title-main">Smoothed analysis</span>

In theoretical computer science, smoothed analysis is a way of measuring the complexity of an algorithm. Since its introduction in 2001, smoothed analysis has been used as a basis for considerable research, for problems ranging from mathematical programming, numerical analysis, machine learning, and data mining. It can give a more realistic analysis of the practical performance of the algorithm compared to analysis that uses worst-case or average-case scenarios.

Because matrix multiplication is such a central operation in many numerical algorithms, much work has been invested in making matrix multiplication algorithms efficient. Applications of matrix multiplication in computational problems are found in many fields including scientific computing and pattern recognition and in seemingly unrelated problems such as counting the paths through a graph. Many different algorithms have been designed for multiplying matrices on different types of hardware, including parallel and distributed systems, where the computational work is spread over multiple processors.

<span class="mw-page-title-main">Wrapped Cauchy distribution</span>

In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.

In machine learning, feature hashing, also known as the hashing trick, is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly, rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction.

<span class="mw-page-title-main">Rendezvous hashing</span>

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of options out of a possible set of options. A typical application is when clients need to agree on which sites objects are assigned to.

<span class="mw-page-title-main">Hyperbolic geometric graph</span>

A hyperbolic geometric graph (HGG) or hyperbolic geometric network (HGN) is a special type of spatial network where (1) latent coordinates of nodes are sprinkled according to a probability density function into a hyperbolic space of constant negative curvature and (2) an edge between two nodes is present if they are close according to a function of the metric (typically either a Heaviside step function resulting in deterministic connections between vertices closer than a certain threshold distance, or a decaying function of hyperbolic distance yielding the connection probability). A HGG generalizes a random geometric graph (RGG) whose embedding space is Euclidean.

References

  1. 1 2 Karger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; Levine, M.; Lewin, D. (1997). Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing. ACM Press New York, NY, USA. pp. 654–663. doi:10.1145/258533.258660.
  2. 1 2 3 Bruce Maggs and Ramesh Sitaraman (2015). "Algorithmic nuggets in content delivery" (PDF). ACM SIGCOMM Computer Communication Review. 45 (3).
  3. Roughgarden & Valiant 2021, p. 2.
  4. Roughgarden & Valiant 2021, p. 7.
  5. Roughgarden & Valiant 2021, p. 8.
  6. I. Stoica et al., "Chord: a scalable peer-to-peer lookup protocol for Internet applications," in IEEE/ACM Transactions on Networking, vol. 11, no. 1, pp. 17–32, Feb. 2003, doi: 10.1109/TNET.2002.808407.
  7. Nygren., E.; Sitaraman R. K.; Sun, J. (2010). "The Akamai Network: A Platform for High-Performance Internet Applications" (PDF). ACM SIGOPS Operating Systems Review. 44 (3): 2–19. doi:10.1145/1842733.1842736. S2CID   207181702. Archived (PDF) from the original on November 30, 2022. Retrieved August 29, 2023.
  8. Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). "Web Caching with Consistent Hashing". Computer Networks. 31 (11): 1203–1213. doi:10.1016/S1389-1286(99)00055-9. Archived from the original on 2008-07-21. Retrieved 2008-02-05.
  9. Roughgarden & Valiant 2021, p. 6.
  10. Moitra 2016, p. 2.
  11. Moitra 2016, p. 2–3.
  12. Moitra 2016, p. 3.
  13. Roughgarden & Valiant 2021, p. 6–7.
  14. "What Exactly Is Membase?". 16 December 2014. Retrieved 2020-10-29.
  15. Holt, Greg (February 2011). "Building a Consistent Hashing Ring". openstack.org. Retrieved 2019-11-17.
  16. DeCandia, G.; Hastorun, D.; Jampani, M.; Kakulapati, G.; Lakshman, A.; Pilchin, A.; Sivasubramanian, S.; Vosshall, P.; Vogels, Werner (2007). "Dynamo" (PDF). ACM Sigops Operating Systems Review. 41 (6): 205–220. doi:10.1145/1323293.1294281 . Retrieved 2018-06-07.
  17. Lakshman, Avinash; Malik, Prashant (2010). "Cassandra: a decentralized structured storage system". ACM SIGOPS Operating Systems Review. 44 (2): 35–40. doi:10.1145/1773912.1773922. S2CID   916681.
  18. "NoSQL Comparison: MongoDB vs ScyllaDB". benchant.com. Retrieved 21 March 2024.
  19. "Design -- Voldemort". www.project-voldemort.com/. Archived from the original on 9 February 2015. Retrieved 9 February 2015. Consistent hashing is a technique that avoids these problems, and we use it to compute the location of each key on the cluster.
  20. "Akka Routing". akka.io. Retrieved 2019-11-16.
  21. "Riak Concepts". Archived from the original on 2015-09-19. Retrieved 2016-12-06.
  22. "GlusterFS Algorithms: Distribution". gluster.org. 2012-03-01. Retrieved 2019-11-16.
  23. Roughgarden, Tim; Valiant, Gregory (2016-03-28). "Modern Algorithmic Toolbox" (PDF). stanford.edu. Retrieved 2019-11-17.
  24. Vishnevskiy, Stanislav (2017-07-06). "How Discord Scaled Elixir to 5,000,000 Concurrent Users" . Retrieved 2022-08-16.
  25. "Consistent Hash Load Balancing for gRPC". 24 November 2021. Retrieved 2023-09-04.
  26. Stoica, I.; Morris, R.; Liben-Nowell, D.; Karger, D.; Kaashoek, M. F.; Dabek, F.; Balakrishnan, H. (25 Feb 2003). "Chord: a scalable peer-to-peer lookup protocol for Internet applications". IEEE/ACM Transactions on Networking. 11 (1): 17–32. doi:10.1109/TNET.2002.808407. S2CID   221276912.
  27. "MinIO Versioning, Metadata and Storage Deep Dive" . Retrieved 2023-10-24.

Works cited