Big O in probability notation

Last updated

The order in probability notation is used in probability theory and statistical theory in direct parallel to the big-O notation that is standard in mathematics. Where the big-O notation deals with the convergence of sequences or sets of ordinary numbers, the order in probability notation deals with convergence of sets of random variables, where convergence is in the sense of convergence in probability. [1]

Contents

Definitions

Small o: convergence in probability

For a set of random variables Xn and a corresponding set of constants an (both indexed by n, which need not be discrete), the notation

means that the set of values Xn/an converges to zero in probability as n approaches an appropriate limit. Equivalently, Xn = op(an) can be written as Xn/an = op(1), i.e.

for every positive ε. [2]

Big O: stochastic boundedness

The notation

means that the set of values Xn/an is stochastically bounded. That is, for any ε > 0, there exists a finite M > 0 and a finite N > 0 such that

Comparison of the two definitions

The difference between the definition is subtle. If one uses the definition of the limit, one gets:

The difference lies in the δ: for stochastic boundedness, it suffices that there exists one (arbitrary large) δ to satisfy the inequality, and δ is allowed to be dependent on ε (hence the δε). On the other side, for convergence, the statement has to hold not only for one, but for any (arbitrary small) δ. In a sense, this means that the sequence must be bounded, with a bound that gets smaller as the sample size increases.

This suggests that if a sequence is op(1), then it is Op(1), i.e. convergence in probability implies stochastic boundedness. But the reverse does not hold.

Example

If is a stochastic sequence such that each element has finite variance, then

(see Theorem 14.4-1 in Bishop et al.)

If, moreover, is a null sequence for a sequence of real numbers, then converges to zero in probability by Chebyshev's inequality, so

Related Research Articles

<span class="mw-page-title-main">Dirac delta function</span> Generalized function whose value is zero everywhere except at zero

In mathematics, the Dirac delta distribution, also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for identically distributed independent samples, the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Limit inferior and limit superior</span> Bounds of a sequence

In mathematics, the limit inferior and limit superior of a sequence can be thought of as limiting bounds on the sequence. They can be thought of in a similar fashion for a function. For a set, they are the infimum and supremum of the set's limit points, respectively. In general, when there are multiple objects around which a sequence, function, or set accumulates, the inferior and superior limits extract the smallest and largest of them; the type of object and the measure of size is context-dependent, but the notion of extreme limits is invariant. Limit inferior is also called infimum limit, limit infimum, liminf, inferior limit, lower limit, or inner limit; limit superior is also known as supremum limit, limit supremum, limsup, superior limit, upper limit, or outer limit.

In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behavior that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behavior can be characterized: two readily understood behaviors are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

In mathematics, the limit of a function is a fundamental concept in calculus and analysis concerning the behavior of that function near a particular input.

<span class="mw-page-title-main">Vapnik–Chervonenkis theory</span> Branch of statistical computational learning theory

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In mathematics, the Radon–Nikodym theorem is a result in measure theory that expresses the relationship between two measures defined on the same measurable space. A measure is a set function that assigns a consistent magnitude to the measurable subsets of a measurable space. Examples of a measure include area and volume, where the subsets are sets of points; or the probability of an event, which is a subset of possible outcomes within a wider probability space.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function or exponential moments. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. The Chernoff bound is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

<span class="mw-page-title-main">Consistent estimator</span> Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures μn on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance ε > 0 we require there be N sufficiently large for nN to ensure the 'difference' between μn and μ is smaller than ε. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

In mathematics, a local martingale is a type of stochastic process, satisfying the localized version of the martingale property. Every martingale is a local martingale; every bounded local martingale is a martingale; in particular, every local martingale that is bounded from below is a supermartingale, and every local martingale that is bounded from above is a submartingale; however, in general a local martingale is not a martingale, because its expectation can be distorted by large values of small probability. In particular, a driftless diffusion process is a local martingale, but not necessarily a martingale.

Convergence in measure is either of two distinct mathematical concepts both of which generalize the concept of convergence in probability.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales.

In real analysis and measure theory, the Vitali convergence theorem, named after the Italian mathematician Giuseppe Vitali, is a generalization of the better-known dominated convergence theorem of Henri Lebesgue. It is a characterization of the convergence in Lp in terms of convergence in measure and a condition related to uniform integrability.

In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if xnx then g(xn) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

This article is supplemental for “Convergence of random variables” and provides proofs for selected results.

References

  1. Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN   0-19-920613-9
  2. Yvonne M. Bishop, Stephen E.Fienberg, Paul W. Holland. (1975,2007) Discrete multivariate analysis, Springer. ISBN   0-387-72805-8, ISBN   978-0-387-72805-6