Spiral hashing

Last updated

Spiral hashing, also known as Spiral Storage is an extensible hashing algorithm. [1] [2] [3] [4] [5] As in all hashing schemes, spiral hashing stores records in a varying number of buckets, using a record key for addressing. In an expanding Linear hashing file, buckets are split in a predefined order. This results in adding a new bucket at the end of the file. While this allows gradual reorganization of the file, the expected number of records in the newly created bucket and the bucket from what it splits falls to half the previous number. Several attempts were made to alleviate this sudden drop in space utilization. Martin's spiral storage uses a different approach. The file consists of a number of continuously numbered buckets. The lower-numbered (left) buckets have a higher expected number of records. When the file expands, the left-most bucket is replaced by two buckets on the right. Some variants of this idea exist. [6] [7] [8]

Spiral hashing requires a uniform hash function of the keys of the records into the unit interval . If the hash file starts at bucket , the key is mapped into a real number . The final address is then computed as where is the "extension factor". When is incremented, approximately new buckets are created on the right. Larson [9] conducted experiments that showed that Linear hashing still had superior performance over Spiral Hashing.

See also

Related Research Articles

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

<span class="mw-page-title-main">Longest common subsequence</span> Algorithmic problem on pairs of sequences

A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences. It differs from the longest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

<span class="mw-page-title-main">Perfect hash function</span> Hash function without any collisions

In computer science, a perfect hash functionh for a set S is a hash function that maps distinct elements in S to a set of m integers, with no collisions. In mathematical terms, it is an injective function.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

<span class="mw-page-title-main">Linear probing</span> Computer programming method for hashing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Linear hashing (LH) is a dynamic data structure which implements a hash table and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980. It has been analyzed by Baeza-Yates and Soza-Pollman. It is the first in a number of schemes known as dynamic hashing such as Larson's Linear Hashing with Partial Extensions, Linear Hashing with Priority Splitting, Linear Hashing with Partial Expansions and Priority Splitting, or Recursive Linear Hashing.

<span class="mw-page-title-main">Cuckoo hashing</span> Data structure hashing scheme

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches in a variation of the behavior referred to as brood parasitism; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.

Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation. This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself.

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

<span class="mw-page-title-main">Bucket queue</span> Data structure for integer priorities

A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum priority. In the bucket queue, the priorities must be integers, and it is particularly suited to applications in which the priorities have a small range. A bucket queue has the form of an array of buckets: an array data structure, indexed by the priorities, whose cells contain collections of items with the same priority as each other. With this data structure, insertion of elements and changes of their priority take constant time. Searching for and removing the minimum-priority element takes time proportional to the number of buckets or, by maintaining a pointer to the most recently found bucket, in time proportional to the difference in priorities between successive operations.

<span class="mw-page-title-main">Searchable symmetric encryption</span> System allowing searching of encrypted documents

Searchable symmetric encryption (SSE) is a form of encryption that allows one to efficiently search over a collection of encrypted documents or files without the ability to decrypt them. SSE can be used to outsource files to an untrusted cloud storage server without ever revealing the files in the clear but while preserving the server's ability to search over them.

References

  1. Martin, G.N. (1979), "Spiral Storage", Tech. Rep. 27, Univ. Of Warwick, Coventry, UK
  2. Mullin, James K. (1981), "Tightly controlled linear hashing without separate overflow storage", BIT Numerical Mathematics, 21 (4): 390–400, doi:10.1007/BF01932837, S2CID   43311559
  3. Mullin, James K. (1985), "Spiral storage: Efficient dynamic hashing with constant performance", The Computer Journal, 28 (3): 330–334, doi: 10.1093/comjnl/28.3.330
  4. Chu, J-H.; Knott, G.D. (1994), "An analysis of spiral hashing", The Computer Journal, 37 (8): 715-719, doi: 10.1093/comjnl/37.8.715
  5. Enbody, Richard; Du, HC (June 1988), "Dynamic hashing schemes", ACM Computing Surveys, 20 (2): 85–113, doi: 10.1145/46157.330532 , S2CID   1437123
  6. Mullin, James K. (1984), "Unified Dynamic Hashing", Proceedings 10th International Conference on Very Large Databases (VLDB)
  7. Kawagoe, K. (1985), "Modified Dynamic Hashing", Proceedings SIGMOD international conference on management of data
  8. Chang, Ye-In; Lee, Chien-I; ChangLiaw, Wann-Bay (1999), "Linear spiral hashing for expansible files", IEEE Transactions on Knowledge and Data Engineering, 11 (6)
  9. Larson, Per-Åke (April 1988), "Dynamic Hash Tables", Communications of the ACM, 31 (4): 446–457, doi: 10.1145/42404.42410 , S2CID   207548097