Quadratic probing

Last updated June 21, 2023

Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash tables. Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found.

Quadratic probing can be a more efficient algorithm in an open addressing table, since it better avoids the clustering problem that can occur with linear probing, although it is not immune. It also provides good memory caching because it preserves some locality of reference; however, linear probing has greater locality and, thus, better cache performance.^{[ dubious – discuss ]}^{[ citation needed ]}

Quadratic function

Let h(k) be a hash function that maps an element k to an integer in [0, m−1], where m is the size of the table. Let the i^th probe position for a value k be given by the function

h(k,i)=h(k)+c_{1}i+c_{2}i^{2}{\pmod {m}}

where c₂ ≠ 0 (If c₂ = 0, then h(k,i) degrades to a linear probe). For a given hash table, the values of c₁ and c₂ remain constant.

Examples:

If $h(k,i)=(h(k)+i+i^{2}){\pmod {m}}$ , then the probe sequence will be $h(k),h(k)+2,h(k)+6,...$
For m = 2ⁿ, a good choice for the constants are c₁ = c₂ = 1/2, as the values of h(k,i) for i in [0, m−1] are all distinct (in fact, it is a permutation on [0, m−1]^[1]). This leads to a probe sequence of $h(k),h(k)+1,h(k)+3,h(k)+6,...$ (the triangular numbers) where the values increase by 1, 2, 3, ...
For prime m > 2, most choices of c₁ and c₂ will make h(k,i) distinct for i in [0, (m−1)/2]. Such choices include c₁ = c₂ = 1/2, c₁ = c₂ = 1, and c₁ = 0, c₂ = 1. However, there are only m/2 distinct probes for a given element, requiring other techniques to guarantee that insertions will succeed when the load factor exceeds 1/2.
For $m=n^{p}$ , where m, n, and p are integer greater or equal 2 (degrades to linear probe when p = 1), then $h(k,i)=(h(k)+i+ni^{2}){\pmod {m}}$ gives cycle of all distinct probes. It can be computed in loop as: $h(k,0)=h(k)$ , and $h(k,i+1)=(h(k,i)+2in+n+1){\pmod {m}}$
For any m, full cycle with quadratic probing can be achieved by rounding up m to closest power of 2, compute probe index: $h(k,i)=h(k)+((i^{2}+i)/2){\pmod {roundUp2(m)}}$ , and skip iteration when $h(k,i)>=m$ . There is maximum $roundUp2(m)-m<m/2$ skipped iterations, and these iterations do not refer to memory, so it is fast operation on most modern processors. Rounding up m can be computed by:

uint64_troundUp2(uint64_tv){v--;v|=v>>1;v|=v>>2;v|=v>>4;v|=v>>8;v|=v>>16;v|=v>>32;v++;returnv;}

Limitations

Alternating signs

If the sign of the offset is alternated (e.g. +1, −4, +9, −16, etc.), and if the number of buckets is a prime number $p$ congruent to 3 modulo 4 (e.g. 3, 7, 11, 19, 23, 31, etc.), then the first $p$ offsets will be unique (modulo $p$ ).^{[ further explanation needed ]} In other words, a permutation of 0 through $p-1$ is obtained, and, consequently, a free bucket will always be found as long as at least one exists.

External links

Tutorial/quadratic probing

↑ The Art of Computer Science Volume 3 Sorting and Searching, Chapter 6.4, exercise 20, Donald Knuth

Related Research Articles

In mathematics, the Chinese remainder theorem states that if one knows the remainders of the Euclidean division of an integer n by several integers, then one can determine uniquely the remainder of the division of n by the product of these integers, under the condition that the divisors are pairwise coprime.

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

In number theory, the Legendre symbol is a multiplicative function with values 1, −1, 0 that is a quadratic character modulo of an odd prime number p: its value at a (nonzero) quadratic residue mod p is 1 and at a non-quadratic residue (non-residue) is −1. Its value at zero is 0.

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.

RSA (Rivest–Shamir–Adleman) is a public-key cryptosystem, one of the oldest, that is widely used for secure data transmission. The acronym "RSA" comes from the surnames of Ron Rivest, Adi Shamir and Leonard Adleman, who publicly described the algorithm in 1977. An equivalent system was developed secretly in 1973 at Government Communications Headquarters (GCHQ) by the English mathematician Clifford Cocks. That system was declassified in 1997.

The Jacobi symbol is a generalization of the Legendre symbol. Introduced by Jacobi in 1837, it is of theoretical interest in modular arithmetic and other branches of number theory, but its main use is in computational number theory, especially primality testing and integer factorization; these in turn are important in cryptography.

In number theory, Euler's criterion is a formula for determining whether an integer is a quadratic residue modulo a prime. Precisely,

The quadratic sieve algorithm (QS) is an integer factorization algorithm and, in practice, the second fastest method known. It is still the fastest for integers under 100 decimal digits or so, and is considerably simpler than the number field sieve. It is a general-purpose factorization algorithm, meaning that its running time depends solely on the size of the integer to be factored, and not on special structure or properties. It was invented by Carl Pomerance in 1981 as an improvement to Schroeppel's linear sieve.

In number theory, quadratic Gauss sums are certain finite sums of roots of unity. A quadratic Gauss sum can be interpreted as a linear combination of the values of the complex exponential function with coefficients given by a quadratic character; for a general character, one obtains a more general Gauss sum. These objects are named after Carl Friedrich Gauss, who studied them extensively and applied them to quadratic, cubic, and biquadratic reciprocity laws.

In mathematics, Hensel's lemma, also known as Hensel's lifting lemma, named after Kurt Hensel, is a result in modular arithmetic, stating that if a univariate polynomial has a simple root modulo a prime number $p$ , then this root can be lifted to a unique root modulo any higher power of $p$ . More generally, if a polynomial factors modulo $p$ into two coprime polynomials, this factorization can be lifted to a factorization modulo any higher power of $p$ .

In cryptography, a message authentication code based on universal hashing, or UMAC, is a type of message authentication code (MAC) calculated choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function used. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message.

Gauss's lemma in number theory gives a condition for an integer to be a quadratic residue. Although it is not useful computationally, it has theoretical significance, being involved in some proofs of quadratic reciprocity.

The Goldwasser–Micali (GM) cryptosystem is an asymmetric key encryption algorithm developed by Shafi Goldwasser and Silvio Micali in 1982. GM has the distinction of being the first probabilistic public-key encryption scheme which is provably secure under standard cryptographic assumptions. However, it is not an efficient cryptosystem, as ciphertexts may be several hundred times larger than the initial plaintext. To prove the security properties of the cryptosystem, Goldwasser and Micali proposed the widely used definition of semantic security.

In mathematics and computing, universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary. Many universal families are known, and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations of hash tables, randomized algorithms, and cryptography.

In cryptography, the Rabin signature algorithm is a method of digital signature originally proposed by Michael O. Rabin in 1978.

The Lehmer random number generator, sometimes also referred to as the Park–Miller random number generator, is a type of linear congruential generator (LCG) that operates in multiplicative group of integers modulo n. The general formula is

In cryptography, Very Smooth Hash (VSH) is a provably secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld. Provably secure means that finding collisions is as difficult as some known hard mathematical problem. Unlike other provably secure collision-resistant hashes, VSH is efficient and usable in practice. Asymptotically, it only requires a single multiplication per log(n) message-bits and uses RSA-type arithmetic. Therefore, VSH can be useful in embedded environments where code space is limited.

Coppersmith's attack describes a class of cryptographic attacks on the public-key cryptosystem RSA based on the Coppersmith method. Particular applications of the Coppersmith method for attacking RSA include cases when the public exponent e is small or when partial knowledge of a prime factor of the secret key is available.

In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. It was first studied in the form of Zobrist hashing for computer games; later work by Carter and Wegman extended this method to arbitrary fixed-length keys. Generalizations of tabulation hashing have also been developed that can handle variable-length keys such as text strings.

In number theory, Berlekamp's root finding algorithm, also called the Berlekamp–Rabin algorithm, is the probabilistic method of finding roots of polynomials over a field $. The method was discovered by Elwyn Berlekamp in 1970 as an auxiliary to the algorithm for polynomial factorization over finite fields. The algorithm was later modified by Rabin for arbitrary finite fields in 1979. The method was also independently discovered before Berlekamp by other researchers.$

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] The Art of Computer Science Volume 3 Sorting and Searching, Chapter 6.4, exercise 20, Donald Knuth

[1]