In quantum mechanics, information theory, and Fourier analysis, the entropic uncertainty or Hirschman uncertainty is defined as the sum of the temporal and spectral Shannon entropies. It turns out that Heisenberg's uncertainty principle can be expressed as a lower bound on the sum of these entropies. This is stronger than the usual statement of the uncertainty principle in terms of the product of standard deviations.
In 1957, [1] Hirschman considered a function f and its Fourier transform g such that
where the "≈" indicates convergence in L2, and normalized so that (by Plancherel's theorem),
He showed that for any such functions the sum of the Shannon entropies is non-negative,
A tighter bound,
was conjectured by Hirschman [1] and Everett, [2] proven in 1975 by W. Beckner [3] and in the same year interpreted as a generalized quantum mechanical uncertainty principle by Białynicki-Birula and Mycielski. [4] The equality holds in the case of Gaussian distributions. [5] Note, however, that the above entropic uncertainty function is distinctly different from the quantum Von Neumann entropy represented in phase space.
The proof of this tight inequality depends on the so-called (q, p)-norm of the Fourier transformation. (Establishing this norm is the most difficult part of the proof.)
From this norm, one is able to establish a lower bound on the sum of the (differential) Rényi entropies, Hα(|f|²)+Hβ(|g|²), where 1/α + 1/β = 2, which generalize the Shannon entropies. For simplicity, we consider this inequality only in one dimension; the extension to multiple dimensions is straightforward and can be found in the literature cited.
The (q, p)-norm of the Fourier transform is defined to be [6]
In 1961, Babenko [7] found this norm for even integer values of q. Finally, in 1975, using Hermite functions as eigenfunctions of the Fourier transform, Beckner [3] proved that the value of this norm (in one dimension) for all q ≥ 2 is
Thus we have the Babenko–Beckner inequality that
From this inequality, an expression of the uncertainty principle in terms of the Rényi entropy can be derived. [6] [8]
Let so that and , we have
Squaring both sides and taking the logarithm, we get
We can rewrite the condition on as
Assume , then we multiply both sides by the negative
to get
Rearranging terms yields an inequality in terms of the sum of Rényi entropies,
Taking the limit of this last inequality as and the substitutions yields the less general Shannon entropy inequality,
valid for any base of logarithm, as long as we choose an appropriate unit of information, bit, nat, etc.
The constant will be different, though, for a different normalization of the Fourier transform, (such as is usually used in physics, with normalizations chosen so that ħ=1 ), i.e.,
In this case, the dilation of the Fourier transform absolute squared by a factor of 2π simply adds log(2π) to its entropy.
The Gaussian or normal probability distribution plays an important role in the relationship between variance and entropy: it is a problem of the calculus of variations to show that this distribution maximizes entropy for a given variance, and at the same time minimizes the variance for a given entropy. In fact, for any probability density function on the real line, Shannon's entropy inequality specifies:
where H is the Shannon entropy and V is the variance, an inequality that is saturated only in the case of a normal distribution.
Moreover, the Fourier transform of a Gaussian probability amplitude function is also Gaussian—and the absolute squares of both of these are Gaussian, too. This can then be used to derive the usual Robertson variance uncertainty inequality from the above entropic inequality, enabling the latter to be tighter than the former. That is (for ħ=1), exponentiating the Hirschman inequality and using Shannon's expression above,
Hirschman [1] explained that entropy—his version of entropy was the negative of Shannon's—is a "measure of the concentration of [a probability distribution] in a set of small measure." Thus a low or large negative Shannon entropy means that a considerable mass of the probability distribution is confined to a set of small measure.
Note that this set of small measure need not be contiguous; a probability distribution can have several concentrations of mass in intervals of small measure, and the entropy may still be low no matter how widely scattered those intervals are. This is not the case with the variance: variance measures the concentration of mass about the mean of the distribution, and a low variance means that a considerable mass of the probability distribution is concentrated in a contiguous interval of small measure.
To formalize this distinction, we say that two probability density functions and are equimeasurable if
where μ is the Lebesgue measure. Any two equimeasurable probability density functions have the same Shannon entropy, and in fact the same Rényi entropy, of any order. The same is not true of variance, however. Any probability density function has a radially decreasing equimeasurable "rearrangement" whose variance is less (up to translation) than any other rearrangement of the function; and there exist rearrangements of arbitrarily high variance, (all having the same entropy.)
The uncertainty principle, also known as Heisenberg's indeterminacy principle, is a fundamental concept in quantum mechanics. It states that there is a limit to the precision with which certain pairs of physical properties, such as position and momentum, can be simultaneously known. In other words, the more accurately one property is measured, the less accurately the other property can be known.
In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.
In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
Multi-index notation is a mathematical notation that simplifies formulas used in multivariable calculus, partial differential equations and the theory of distributions, by generalising the concept of an integer index to an ordered tuple of indices.
In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.
In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformation, or Fenchel conjugate. The convex conjugate is widely used for constructing the dual problem in optimization theory, thus generalizing Lagrangian duality.
In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In probability theory and statistics, the beta prime distribution is an absolutely continuous probability distribution. If has a beta distribution, then the odds has a beta prime distribution.
Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.
In mathematics, a real or complex-valued function f on d-dimensional Euclidean space satisfies a Hölder condition, or is Hölder continuous, when there are real constants C ≥ 0, α > 0, such that for all x and y in the domain of f. More generally, the condition can be formulated for functions between any two metric spaces. The number is called the exponent of the Hölder condition. A function on an interval satisfying the condition with α > 1 is constant. If α = 1, then the function satisfies a Lipschitz condition. For any α > 0, the condition implies the function is uniformly continuous. The condition is named after Otto Hölder.
In mathematical analysis, the Schur test, named after German mathematician Issai Schur, is a bound on the operator norm of an integral operator in terms of its Schwartz kernel.
A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.
Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.
In mathematics, the Babenko–Beckner inequality (after Konstantin I. Babenko and William E. Beckner) is a sharpened form of the Hausdorff–Young inequality having applications to uncertainty principles in the Fourier analysis of Lp spaces. The (q, p)-norm of the n-dimensional Fourier transform is defined to be
Khabibullin's conjecture is a conjecture in mathematics related to Paley's problem for plurisubharmonic functions and to various extremal problems in the theory of entire functions of several variables. The conjecture was named after its proposer, B. N. Khabibullin.
A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.
In mathematics, Young's inequality for products is a mathematical inequality about the product of two numbers. The inequality is named after William Henry Young and should not be confused with Young's convolution inequality.