Empirical measure

Last updated February 09, 2024

In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.

The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure $P$ . We collect observations $X_{1},X_{2},\dots ,X_{n}$ and compute relative frequencies. We can estimate $P$ , or a related distribution function $F$ by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.

Definition

Let $X_{1},X_{2},\dots$ be a sequence of independent identically distributed random variables with values in the state space S with probability distribution P.

Definition

The empirical measureP_n is defined for measurable subsets of S and given by

P_{n}(A)={1 \over n}\sum _{i=1}^{n}I_{A}(X_{i})={\frac {1}{n}}\sum _{i=1}^{n}\delta _{X_{i}}(A)

where

I_{A}

is the indicator function and

\delta _{X}

is the Dirac measure.

Properties

For a fixed measurable set A, nP_n(A) is a binomial random variable with mean nP(A) and variance nP(A)(1 − P(A)).
- In particular, P_n(A) is an unbiased estimator of P(A).
For a fixed partition $A_{i}$ $Empirical measure$ of S, random variables $Y_{i}=nP_{n}(A_{i})$ $Empirical measure$ form a multinomial distribution with event probabilities $P(A_{i})$ $Empirical measure$
- The covariance matrix of this multinomial distribution is $Cov(Y_{i},Y_{j})=nP(A_{i})(\delta _{ij}-P(A_{j}))$ .

Definition

{\bigl (}P_{n}(c){\bigr )}_{c\in {\mathcal {C}}}

is the empirical measure indexed by

{\mathcal {C}}

, a collection of measurable subsets of S.

To generalize this notion further, observe that the empirical measure $P_{n}$ maps measurable functions $f:S\to \mathbb {R}$ to their empirical mean ,

f\mapsto P_{n}f=\int _{S}f\,dP_{n}={\frac {1}{n}}\sum _{i=1}^{n}f(X_{i})

In particular, the empirical measure of A is simply the empirical mean of the indicator function, P_n(A) = P_nI_A.

For a fixed measurable function $f$ , $P_{n}f$ is a random variable with mean $\mathbb {E} f$ and variance ${\frac {1}{n}}\mathbb {E} (f-\mathbb {E} f)^{2}$ .

By the strong law of large numbers, P_n(A) converges to P(A) almost surely for fixed A. Similarly $P_{n}f$ converges to $\mathbb {E} f$ almost surely for a fixed measurable function $f$ . The problem of uniform convergence of P_n to P was open until Vapnik and Chervonenkis solved it in 1968.^[1]

If the class ${\mathcal {C}}$ (or ${\mathcal {F}}$ ) is Glivenko–Cantelli with respect to P then P_n converges to P uniformly over $c\in {\mathcal {C}}$ (or $f\in {\mathcal {F}}$ ). In other words, with probability 1 we have

\|P_{n}-P\|_{\mathcal {C}}=\sup _{c\in {\mathcal {C}}}|P_{n}(c)-P(c)|\to 0,

\|P_{n}-P\|_{\mathcal {F}}=\sup _{f\in {\mathcal {F}}}|P_{n}f-\mathbb {E} f|\to 0.

Empirical distribution function

The empirical distribution function provides an example of empirical measures. For real-valued iid random variables $X_{1},\dots ,X_{n}$ it is given by

F_{n}(x)=P_{n}((-\infty ,x])=P_{n}I_{(-\infty ,x]}.

In this case, empirical measures are indexed by a class ${\mathcal {C}}=\{(-\infty ,x]:x\in \mathbb {R} \}.$ It has been shown that ${\mathcal {C}}$ is a uniform Glivenko–Cantelli class, in particular,

\sup _{F}\|F_{n}(x)-F(x)\|_{\infty }\to 0

with probability 1.

Related Research Articles

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

In mathematical analysis and in probability theory, a σ-algebra on a set X is a nonempty collection Σ of subsets of X closed under complement, countable unions, and countable intersections. The ordered pair $is called a measurable space.$

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In probability theory, there exist several different notions of convergence of sequences of random variables. The different notions of convergence capture different properties about the sequence, with some notions of convergence being stronger than others. For example, convergence in distribution tells us about the limit distribution of a sequence of random variables. This is a weaker notion than convergence in probability, which tells us about the value a random variable will take, rather than just the distribution.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In mathematics, the limit of a sequence of sets $is a set whose elements are determined by the sequence in either of two equivalent ways: (1) by upper and lower bounds on the sequence that converge monotonically to the same set and (2) by convergence of a sequence of indicator functions which are themselves real-valued. As is the case with sequences of other objects, convergence is not necessary or even usual.$

In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the L¹ norm. Its power and utility are two of the primary theoretical advantages of Lebesgue integration over Riemann integration.

In probability theory, Lévy’s continuity theorem, or Lévy's convergence theorem, named after the French mathematician Paul Lévy, connects convergence in distribution of the sequence of random variables with pointwise convergence of their characteristic functions. This theorem is the basis for one approach to prove the central limit theorem and is one of the major theorems concerning characteristic functions.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by $1/ n$ at each of the $n$ data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In probability theory, an empirical process is a stochastic process that characterizes the deviation of the empirical distribution function its expectation. In mean field theory, limit theorems are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.

In the theory of probability, the Glivenko–Cantelli theorem, named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows. Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely.

In probability theory, Donsker's theorem, named after Monroe D. Donsker, is a functional extension of the central limit theorem for empirical distribution functions. Specifically, the theorem states that an appropriately centered and scaled version of the empirical distribution function converges to a Gaussian process.

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures $μ n$ on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure $μ$ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance $ε > 0$ we require there be $N$ sufficiently large for $n \geq N$ to ensure the 'difference' between $μ n$ and $μ$ is smaller than $ε$ . Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales.

In probability theory, a standard probability space, also called Lebesgue–Rokhlin probability space or just Lebesgue space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940. Informally, it is a probability space consisting of an interval and/or a finite or countable number of atoms.

In real analysis and measure theory, the Vitali convergence theorem, named after the Italian mathematician Giuseppe Vitali, is a generalization of the better-known dominated convergence theorem of Henri Lebesgue. It is a characterization of the convergence in L^p in terms of convergence in measure and a condition related to uniform integrability.

In statistical learning theory, a learnable function class is a set of functions for which an algorithm can be devised to asymptotically minimize the expected risk, uniformly over all probability distributions. The concept of learnable classes are closely related to regularization in machine learning, and provides large sample justifications for certain learning algorithms.

A class of functions is considered a Donsker class if it satisfies Donsker's theorem, a functional generalization of the central limit theorem.

References

↑ Vapnik, V.; Chervonenkis, A (1968). "Uniform convergence of frequencies of occurrence of events to their probabilities". Dokl. Akad. Nauk SSSR. 181.

Empirical measure

Contents

Definition

Empirical distribution function

See also

Related Research Articles

References

Further reading