In probability theory, the central limit theorem (CLT) states that, in many situations, when independent and identically distributed random variables are added, their properly normalized sum tends toward a normal distribution. This article gives two illustrations of this theorem. Both involve the sum of independent and identically-distributed random variables and show how the probability distribution of the sum approaches the normal distribution as the number of terms in the sum increases.
The first illustration involves a continuous probability distribution, for which the random variables have a probability density function. The second illustration, for which most of the computation can be done by hand, involves a discrete probability distribution, which is characterized by a probability mass function.
The density of the sum of two independent real-valued random variables equals the convolution of the density functions of the original variables.
Thus, the density of the sum of m+n terms of a sequence of independent identically distributed variables equals the convolution of the densities of the sums of m terms and of n term. In particular, the density of the sum of n+1 terms equals the convolution of the density of the sum of n terms with the original density (the "sum" of 1 term).
A probability density function is shown in the first figure below. Then the densities of the sums of two, three, and four independent identically distributed variables, each having the original density, are shown in the following figures. If the original density is a piecewise polynomial, as it is in the example, then so are the sum densities, of increasingly higher degree. Although the original density is far from normal, the density of the sum of just a few variables with that density is much smoother and has some of the qualitative features of the normal density.
The convolutions were computed via the discrete Fourier transform. A list of values y = f(x0 + k Δx) was constructed, where f is the original density function, and Δx is approximately equal to 0.002, and k is equal to 0 through 1000. The discrete Fourier transform Y of y was computed. Then the convolution of f with itself is proportional to the inverse discrete Fourier transform of the pointwise product of Y with itself.
We start with a probability density function. This function, although discontinuous, is far from the most pathological example that could be created. It is a piecewise polynomial, with pieces of degrees 0 and 1. The mean of this distribution is 0 and its standard deviation is 1.
Next we compute the density of the sum of two independent variables, each having the above density. The density of the sum is the convolution of the above density with itself.
The sum of two variables has mean 0. The density shown in the figure at right has been rescaled by , so that its standard deviation is 1.
This density is already smoother than the original. There are obvious lumps, which correspond to the intervals on which the original density was defined.
We then compute the density of the sum of three independent variables, each having the above density. The density of the sum is the convolution of the first density with the second.
The sum of three Variable (mathematics)|variables has mean 0. The density shown in the figure at right has been rescaled by √3, so that its standard deviation is 1.
This density is even smoother than the preceding one. The lumps can hardly be detected in this figure.
Finally, we compute the density of the sum of four independent variables, each having the above density. The density of the sum is the convolution of the first density with the third (or the second density with itself).
The sum of four variables has mean 0. The density shown in the figure at right has been rescaled by √4, so that its standard deviation is 1.
This density appears qualitatively very similar to a normal density. No lumps can be distinguished by the eye.
This section illustrates the central limit theorem via an example for which the computation can be done quickly by hand on paper, unlike the more computing-intensive example of the previous section.
Suppose the probability distribution of a discrete random variable X puts equal weights on 1, 2, and 3:
The probability mass function of the random variable X may be depicted by the following bar graph:
Clearly this looks nothing like the bell-shaped curve of the normal distribution. Contrast the above with the depictions below.
Now consider the sum of two independent copies of X:
The probability mass function of this sum may be depicted thus:
This still does not look very much like the bell-shaped curve, but, like the bell-shaped curve and unlike the probability mass function of X itself, it is higher in the middle than in the two tails.
Now consider the sum of three independent copies of this random variable:
The probability mass function of this sum may be depicted thus:
Not only is this bigger at the center than it is at the tails, but as one moves toward the center from either tail, the slope first increases and then decreases, just as with the bell-shaped curve.
The degree of its resemblance to the bell-shaped curve can be quantified as follows. Consider
How close is this to what a normal approximation would give? It can readily be seen that the expected value of Y = X1 + X2 + X3 is 6 and the standard deviation of Y is the square root of 2. Since Y ≤ 7 (weak inequality) if and only if Y < 8 (strict inequality), we use a continuity correction and seek
where Z has a standard normal distribution. The difference between 0.85185... and 0.85558... seems remarkably small when it is considered that the number of independent random variables that were added was only three.
The following image shows the result of a simulation based on the example presented in this page. The extraction from the uniform distribution is repeated 1,000 times, and the results are summed.
Since the simulation is based on the Monte Carlo method, the process is repeated 10,000 times. The results shows that the distribution of the sum of 1,000 uniform extractions resembles the bell-shaped curve very well.
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.
In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter definition, it is a deterministic distribution and takes only a single value. Examples include a two-headed coin and rolling a die whose sides all show the same number. This distribution satisfies the definition of "random variable" even though it does not appear random in the everyday sense of the word; hence it is considered degenerate.
In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.
In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).
In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.
In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.
In probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more non-constant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:Z = X + Y. If, further, it can be expressed as the distribution of the sum of two or more independent identically distributed random variables, then it is divisible:Z = X1 + X2.
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
In probability theory and statistics, there are several relationships among probability distributions. These relations can be categorized in the following groups: