In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.
As its name implies, the moment-generating function can be used to compute a distribution’s moments: the nth moment about 0 is the nth derivative of the moment-generating function, evaluated at 0.
In addition to real-valued distributions (univariate distributions), moment-generating functions can be defined for vector- or matrix-valued random variables, and can even be extended to more general cases.
The moment-generating function of a real-valued distribution does not always exist, unlike the characteristic function. There are relations between the behavior of the moment-generating function of a distribution and properties of the distribution, such as the existence of moments.
Let be a random variable with CDF . The moment generating function (mgf) of (or ), denoted by , is
provided this expectation exists for in some open neighborhood of 0. That is, there is an such that for all in , exists. If the expectation does not exist in an open neighborhood of 0, we say that the moment generating function does not exist. [1]
In other words, the moment-generating function of X is the expectation of the random variable . More generally, when , an -dimensional random vector, and is a fixed vector, one uses instead of :
always exists and is equal to 1. However, a key problem with moment-generating functions is that moments and the moment-generating function may not exist, as the integrals need not converge absolutely. By contrast, the characteristic function or Fourier transform always exists (because it is the integral of a bounded function on a space of finite measure), and for some purposes may be used instead.
The moment-generating function is so named because it can be used to find the moments of the distribution. [2] The series expansion of is
Hence
where is the th moment. Differentiating times with respect to and setting , we obtain the th moment about the origin, ; see Calculations of moments below.
If is a continuous random variable, the following relation between its moment-generating function and the two-sided Laplace transform of its probability density function holds:
since the PDF's two-sided Laplace transform is given as
and the moment-generating function's definition expands (by the law of the unconscious statistician) to
This is consistent with the characteristic function of being a Wick rotation of when the moment generating function exists, as the characteristic function of a continuous random variable is the Fourier transform of its probability density function , and in general when a function is of exponential order, the Fourier transform of is a Wick rotation of its two-sided Laplace transform in the region of convergence. See the relation of the Fourier and Laplace transforms for further information.
Here are some examples of the moment-generating function and the characteristic function for comparison. It can be seen that the characteristic function is a Wick rotation of the moment-generating function when the latter exists.
Distribution | Moment-generating function | Characteristic function |
---|---|---|
Degenerate | ||
Bernoulli | ||
Binomial | ||
Geometric | ||
Negative binomial | ||
Poisson | ||
Uniform (continuous) | ||
Uniform (discrete) | ||
Laplace | ||
Normal | ||
Chi-squared | ||
Noncentral chi-squared | ||
Gamma | ||
Exponential | ||
Beta | (see Confluent hypergeometric function) | |
Multivariate normal | ||
Cauchy | Does not exist | |
Multivariate Cauchy | Does not exist | |
The moment-generating function is the expectation of a function of the random variable, it can be written as:
Note that for the case where has a continuous probability density function , is the two-sided Laplace transform of .
where is the th moment.
If random variable has moment generating function , then has moment generating function
If , where the Xi are independent random variables and the ai are constants, then the probability density function for Sn is the convolution of the probability density functions of each of the Xi, and the moment-generating function for Sn is given by
For vector-valued random variables with real components, the moment-generating function is given by
where is a vector and is the dot product.
Moment generating functions are positive and log-convex,[ citation needed ] with M(0) = 1.
An important property of the moment-generating function is that it uniquely determines the distribution. In other words, if and are two random variables and for all values of t,
then
for all values of x (or equivalently X and Y have the same distribution). This statement is not equivalent to the statement "if two distributions have the same moments, then they are identical at all points." This is because in some cases, the moments exist and yet the moment-generating function does not, because the limit
may not exist. The log-normal distribution is an example of when this occurs.
The moment-generating function is so called because if it exists on an open interval around t = 0, then it is the exponential generating function of the moments of the probability distribution:
That is, with n being a nonnegative integer, the nth moment about 0 is the nth derivative of the moment generating function, evaluated at t = 0.
Jensen's inequality provides a simple lower bound on the moment-generating function:
where is the mean of X.
The moment-generating function can be used in conjunction with Markov's inequality to bound the upper tail of a real random variable X. This statement is also called the Chernoff bound. Since is monotonically increasing for , we have
for any and any a, provided exists. For example, when X is a standard normal distribution and , we can choose and recall that . This gives , which is within a factor of 1+a of the exact value.
Various lemmas, such as Hoeffding's lemma or Bennett's inequality provide bounds on the moment-generating function in the case of a zero-mean, bounded random variable.
When is non-negative, the moment generating function gives a simple, useful bound on the moments:
For any and .
This follows from the inequality into which we can substitute implies for any . Now, if and , this can be rearranged to . Taking the expectation on both sides gives the bound on in terms of .
As an example, consider with degrees of freedom. Then from the examples . Picking and substituting into the bound:
We know that in this case the correct bound is . To compare the bounds, we can consider the asymptotics for large . Here the moment-generating function bound is , where the real bound is . The moment-generating function bound is thus very strong in this case.
Related to the moment-generating function are a number of other transforms that are common in probability theory:
This article includes a list of general references, but it lacks sufficient corresponding inline citations .(February 2010) |
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.
In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.
In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.
In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.
In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis.
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.
In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.
In probability theory and statistics, the factorial moment generating function (FMGF) of the probability distribution of a real-valued random variable X is defined as
In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.