Leftover hash lemma

Last updated

The leftover hash lemma is a lemma in cryptography first stated by Russell Impagliazzo, Leonid Levin, and Michael Luby. [1]

Imagine that you have a secret key X that has n uniform random bits, and you would like to use this secret key to encrypt a message. Unfortunately, you were a bit careless with the key, and know that an adversary was able to learn the values of some t < n bits of that key, but you do not know which t bits. Can you still use your key, or do you have to throw it away and choose a new key? The leftover hash lemma tells us that we can produce a key of about nt bits, over which the adversary has almost no knowledge. Since the adversary knows all but nt bits, this is almost optimal.

More precisely, the leftover hash lemma tells us that we can extract a length asymptotic to (the min-entropy of X) bits from a random variable X that are almost uniformly distributed. In other words, an adversary who has some partial knowledge about X, will have almost no knowledge about the extracted value. That is why this is also called privacy amplification (see privacy amplification section in the article Quantum key distribution).

Randomness extractors achieve the same result, but use (normally) less randomness.

Let X be a random variable over and let . Let be a 2-universal hash function. If

then for S uniform over and independent of X, we have:

where U is uniform over and independent of S. [2]

is the min-entropy of X, which measures the amount of randomness X has. The min-entropy is always less than or equal to the Shannon entropy. Note that is the probability of correctly guessing X. (The best guess is to guess the most probable value.) Therefore, the min-entropy measures how difficult it is to guess X.

is a statistical distance between X and Y.

See also

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the good convergence behaviour of monotonic sequences, i.e. sequences that are non-increasing, or non-decreasing. In its simplest form, it says that a non-decreasing bounded-above sequence of real numbers converges to its smallest upper bound, its supremum. Likewise, a non-increasing bounded-below sequence converges to its largest lower bound, its infimum. In particular, infinite sums of non-negative numbers converge to the supremum of the partial sums if and only if the partial sums are bounded.

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In mathematics, subadditivity is a property of a function that states, roughly, that evaluating the function for the sum of two elements of the domain always returns something less than or equal to the sum of the function's values at each element. There are numerous examples of subadditive functions in various areas of mathematics, particularly norms and square roots. Additive maps are special cases of subadditive functions.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. Mathematically, it is defined as

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

In the theory of probability, the Glivenko–Cantelli theorem, named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows. Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely.

In mathematics and computing, universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary. Many universal families are known, and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations of hash tables, randomized algorithms, and cryptography.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).

A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weak entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent from the source and uniformly distributed. Examples of weakly random sources include radioactive decay or thermal noise; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (TRNG); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

In theoretical computer science, the algorithmic Lovász local lemma gives an algorithmic way of constructing objects that obey a system of constraints with limited dependence.

<span class="mw-page-title-main">Differential privacy</span> Methods of safely sharing general data

Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while limiting information that is leaked about specific individuals. This is done by injecting carefully calibrated noise into statistical computations such that the utility of the statistic is preserved while provably limiting what can be inferred about any individual in the dataset.

In theoretical computer science, the term isolation lemma refers to randomized algorithms that reduce the number of solutions to a problem to one, should a solution exist. This is achieved by constructing random constraints such that, with non-negligible probability, exactly one solution satisfies these additional constraints if the solution space is not empty. Isolation lemmas have important applications in computer science, such as the Valiant–Vazirani theorem and Toda's theorem in computational complexity theory.

Fuzzy extractors are a method that allows biometric data to be used as inputs to standard cryptographic techniques, to enhance computer security. "Fuzzy", in this context, refers to the fact that the fixed values required for cryptography will be extracted from values close to but not identical to the original key, without compromising the security required. One application is to encrypt and authenticate users records, using the biometric inputs of the user as a key.

The min-entropy, in information theory, is the smallest of the Rényi family of entropies, corresponding to the most conservative way of measuring the unpredictability of a set of outcomes, as the negative logarithm of the probability of the most likely outcome. The various Rényi entropies are all equal for a uniform distribution, but measure the unpredictability of a nonuniform distribution in different ways. The min-entropy is never greater than the ordinary or Shannon entropy and that in turn is never greater than the Hartley or max-entropy, defined as the logarithm of the number of outcomes with nonzero probability.

References

  1. Impagliazzo, Russell; Levin, Leonid A.; Luby, Michael (1989), "Pseudo-random Generation from one-way functions", in Johnson, David S. (ed.), Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989, Seattle, Washington, USA, {ACM}, pp. 12–24, doi:10.1145/73007.73009, S2CID   18587852
  2. Rubinfeld, Ronnit; Drucker, Andy (April 30, 2008), "Lecture 22: The Leftover Hash Lemma and Explicit Extractions" (PDF), Lecture notes for MIT course 6.842, Randomness and Computation, MIT, retrieved 2019-02-19