Dudley's entropy integral

Last updated January 02, 2024

Dudley's entropy integral is a mathematical concept in the field of probability theory that describes a relationship involving the entropy of certain metric spaces and the concentration of measure phenomenon. It is named after the mathematician R. M. Dudley, who introduced the integral as part of his work on the uniform central limit theorem.

Definition

The Dudley's entropy integral is defined for a metric space $(T,d)$ equipped with a probability measure $\mu$ . Given a set $T$ and an $\epsilon$ -covering, the entropy of $T$ is the logarithm of the minimum number of balls of radius $\epsilon$ required to cover $T$ . Dudley's entropy integral is then given by the formula:

$\int _{0}^{\infty }{\sqrt {\log N(T,d,\epsilon )}}\,d\epsilon$

where $N(T,d,\epsilon )$ is the covering number, i.e. the minimum number of balls of radius $\epsilon$ with respect to the metric $d$ that cover the space $T$ .^[1]

Mathematical background

Dudley's entropy integral arises in the context of empirical processes and Gaussian processes, where it is used to bound the supremum of a stochastic process. Its significance lies in providing a metric entropy measure to assess the complexity of a space with respect to a given probability distribution. More specifically, the expected supremum of a sub-gaussian process is bounded up to finite constants by the entropy integral. Additionally, function classes with a finite entropy integral satisfy a uniform central limit theorem.^[2]^[1]

Related Research Articles

In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence, in the sense that the convergence is uniform over the domain. A sequence of functions $converges uniformly to a limiting function on a set as the function domain if, given any arbitrarily small positive number, a number can be found such that each of the functions differs from by no more than at every point in . Described in an informal way, if converges to uniformly, then how quickly the functions approach is "uniform" throughout in the following sense: in order to guarantee that differs from by less than a chosen distance, we only need to make sure that is larger than or equal to a certain, which we can find without knowing the value of in advance. In other words, there exists a number that could depend on but is independent of, such that choosing will ensure that for all . In contrast, pointwise convergence of to merely guarantees that for any given in advance, we can find such that, for that particular, falls within of whenever .$

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

In calculus and real analysis, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central operations of calculus—differentiation and integration. This relationship is commonly characterized in the framework of Riemann integration, but with absolute continuity it may be formulated in terms of Lebesgue integration. For real-valued functions on the real line, two interrelated notions appear: absolute continuity of functions and absolute continuity of measures. These two notions are generalized in different directions. The usual derivative of a function is related to the Radon–Nikodym derivative, or density, of a measure. We have the following chains of inclusions for functions over a compact subset of the real line:

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In mathematics, the Radon–Nikodym theorem is a result in measure theory that expresses the relationship between two measures defined on the same measurable space. A measure is a set function that assigns a consistent magnitude to the measurable subsets of a measurable space. Examples of a measure include area and volume, where the subsets are sets of points; or the probability of an event, which is a subset of possible outcomes within a wider probability space.

A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

$<span class="mw-page-title-main">Minkowski–Bouligand dimension</span> Method of determining fractal dimension$

In fractal geometry, the Minkowski–Bouligand dimension, also known as Minkowski dimension or box-counting dimension, is a way of determining the fractal dimension of a set $in a Euclidean space, or more generally in a metric space . It is named after the Polish mathematician Hermann Minkowski and the French mathematician Georges Bouligand.$

In mathematics, concentration of measure is a principle that is applied in measure theory, probability and combinatorics, and has consequences for other fields such as Banach space theory. Informally, it states that "A random variable that depends in a Lipschitz way on many independent variables is essentially constant".

In mathematics, the Bernoulli scheme or Bernoulli shift is a generalization of the Bernoulli process to more than two possible outcomes. Bernoulli schemes appear naturally in symbolic dynamics, and are thus important in the study of dynamical systems. Many important dynamical systems exhibit a repellor that is the product of the Cantor set and a smooth manifold, and the dynamics on the Cantor set are isomorphic to that of the Bernoulli shift. This is essentially the Markov partition. The term shift is in reference to the shift operator, which may be used to study Bernoulli schemes. The Ornstein isomorphism theorem shows that Bernoulli shifts are isomorphic when their entropy is equal.

In probability theory, Donsker's theorem, named after Monroe D. Donsker, is a functional extension of the central limit theorem for empirical distribution functions. Specifically, the theorem states that an appropriately centered and scaled version of the empirical distribution function converges to a Gaussian process.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

In mathematics, the topological entropy of a topological dynamical system is a nonnegative extended real number that is a measure of the complexity of the system. Topological entropy was first introduced in 1965 by Adler, Konheim and McAndrew. Their definition was modelled after the definition of the Kolmogorov–Sinai, or metric entropy. Later, Dinaburg and Rufus Bowen gave a different, weaker definition reminiscent of the Hausdorff dimension. The second definition clarified the meaning of the topological entropy: for a system given by an iterated function, the topological entropy represents the exponential growth rate of the number of distinguishable orbits of the iterates. An important variational principle relates the notions of topological and measure-theoretic entropy.

In mathematics, the Lebesgue differentiation theorem is a theorem of real analysis, which states that for almost every point, the value of an integrable function is the limiting average taken around the point. The theorem is named for Henri Lebesgue.

In mathematics, the Vitali covering lemma is a combinatorial and geometric result commonly used in measure theory of Euclidean spaces. This lemma is an intermediate step, of independent interest, in the proof of the Vitali covering theorem. The covering theorem is credited to the Italian mathematician Giuseppe Vitali. The theorem states that it is possible to cover, up to a Lebesgue-negligible set, a given subset E of R^d by a disjoint family extracted from a Vitali covering of E.

In mathematics, Schilder's theorem is a generalization of the Laplace method from integrals on $to functional Wiener integration. The theorem is used in the large deviations theory of stochastic processes. Roughly speaking, out of Schilder's theorem one gets an estimate for the probability that a (scaled-down) sample path of Brownian motion will stray far from the mean path. This statement is made precise using rate functions. Schilder's theorem is generalized by the Freidlin-Wentzell theorem for Itō diffusions.$

In mathematics, a covering number is the number of balls of a given size needed to completely cover a given space, with possible overlaps between the balls. The covering number quantifies the size of a set and can be applied to general metric spaces. Two related concepts are the packing number, the number of disjoint balls that fit in a space, and the metric entropy, the number of points that fit in a space when constrained to lie at some fixed minimum distance apart.

In probability theory, Dudley's theorem is a result relating the expected upper bound and regularity properties of a Gaussian process to its entropy and covariance structure.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

References

1 2 Vershynin R. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press; 2018.
↑ Vaart AW van der. Asymptotic Statistics. Cambridge University Press; 1998.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rom-1] 1 2 Vershynin R. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press; 2018.

[2] Vaart AW van der. Asymptotic Statistics. Cambridge University Press; 1998.

[1]

[2]

Dudley's entropy integral

Contents

Definition

Mathematical background

See also

Related Research Articles

References