Second moment method

Last updated

In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive. More generally, the "moment method" consists of bounding the probability that a random variable fluctuates far from its mean, by using its moments. [1]

Contents

The method is often quantitative, in that one can often deduce a lower bound on the probability that the random variable is larger than some constant times its expectation. The method involves comparing the second moment of random variables to the square of the first moment.

First moment method

The first moment method is a simple application of Markov's inequality for integer-valued variables. For a non-negative, integer-valued random variable X, we may want to prove that X = 0 with high probability. To obtain an upper bound for P(X > 0), and thus a lower bound for P(X = 0), we first note that since X takes only integer values, P(X > 0) = P(X ≥ 1). Since X is non-negative we can now apply Markov's inequality to obtain P(X ≥ 1) ≤ E[X]. Combining these we have P(X > 0) ≤ E[X]; the first moment method is simply the use of this inequality.

Second moment method

In the other direction, E[X] being "large" does not directly imply that P(X = 0) is small. However, we can often use the second moment to derive such a conclusion, using Cauchy–Schwarz inequality.

Theorem: If X  0 is a random variable with finite variance, then

Proof: Using the Cauchy–Schwarz inequality, we have

Solving for , the desired inequality then follows. ∎

The method can also be used on distributional limits of random variables. Furthermore, the estimate of the previous theorem can be refined by means of the so-called Paley–Zygmund inequality. Suppose that Xn is a sequence of non-negative real-valued random variables which converge in law to a random variable X. If there are finite positive constants c1, c2 such that

hold for every n, then it follows from the Paley–Zygmund inequality that for every n and θ in (0, 1)

Consequently, the same inequality is satisfied by X.

Example application of method

Setup of problem

The Bernoulli bond percolation subgraph of a graph G at parameter p is a random subgraph obtained from G by deleting every edge of G with probability 1−p, independently. The infinite complete binary tree T is an infinite tree where one vertex (called the root) has two neighbors and every other vertex has three neighbors. The second moment method can be used to show that at every parameter p ∈ (1/2, 1] with positive probability the connected component of the root in the percolation subgraph of T is infinite.

Application of method

Let K be the percolation component of the root, and let Tn be the set of vertices of T that are at distance n from the root. Let Xn be the number of vertices in TnK. To prove that K is infinite with positive probability, it is enough to show that with positive probability. By the reverse Fatou lemma, it suffices to show that . The Cauchy–Schwarz inequality gives

Therefore, it is sufficient to show that

that is, that the second moment is bounded from above by a constant times the first moment squared (and both are nonzero). In many applications of the second moment method, one is not able to calculate the moments precisely, but can nevertheless establish this inequality.

In this particular application, these moments can be calculated. For every specific v in Tn,

Since , it follows that

which is the first moment. Now comes the second moment calculation.

For each pair v, u in Tn let w(v, u) denote the vertex in T that is farthest away from the root and lies on the simple path in T to each of the two vertices v and u, and let k(v, u) denote the distance from w to the root. In order for v, u to both be in K, it is necessary and sufficient for the three simple paths from w(v, u) to v, u and the root to be in K. Since the number of edges contained in the union of these three paths is 2nk(v, u), we obtain

The number of pairs (v, u) such that k(v, u) = s is equal to , for s = 0, 1, ..., n. Hence,

which completes the proof.

Discussion

In other applications, the corresponding useful random variables are integrals
where the functions fn are random. In such a situation, one considers the product measure μ × μ and calculates
where the last step is typically justified using Fubini's theorem.

Related Research Articles

In probability theory, the expected value of a random variable is a generalization of the weighted average and intuitively is the arithmetic mean of a large number of independent realizations of that variable. The expected value is also known as the expectation, mathematical expectation, mean, average, or first moment.

Standard deviation Measure of the amount of variation or dispersion of a set of values

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Variance Statistical measure

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers are spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , or .

In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

Probability density function Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

Exponential distribution Probability distribution

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

Log-normal distribution Probability distribution

In probability theory, a log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences as well as medicine, economics and other fields, e.g. for energies, concentrations, lengths, financial returns and other amounts.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

Law of large numbers Theorem in probability and statistics

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads or tails, respectively. In particular, unfair coins would have

In probability theory, the probability generating function of a discrete random variable is a power series representation of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

Jensens inequality Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have identical cumulants as well, and similarly the cumulants determine the moments.

In mathematics, a moment is a specific quantitative measure of the shape of a function.

In probability theory, the Chernoff bound, named after Herman Chernoff but due to Herman Rubin, gives exponentially decreasing bounds on tail distributions of sums of independent random variables. It is a sharper bound than the known first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent – a condition that neither Markov's inequality nor Chebyshev's inequality require, although Chebyshev's inequality does require the variates to be pairwise independent.

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

Uniform distribution (continuous) uniform distribution on an interval

In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, a and b, which are the minimum and maximum values. The interval can be either be closed or open. Therefore, the distribution is often abbreviated U, where U stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable X under no constraint other than that it is contained in the distribution's support.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value. The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables.

References

  1. Terence Tao (2008-06-18). "The strong law of large numbers". What’s new?. Retrieved 2009-02-10.