Moment (mathematics)

Last updated December 09, 2024

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment (normalized by total mass) is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis.

Significance of the moments
Standardized moments
Notable moments
Higher moments
Mixed moments
Properties of moments
Transformation of center
The moment of a convolution of function
Cumulants
Sample moments
Problem of moments
Partial moments
Central moments in metric spaces
See also
References
Further reading
External links

For a distribution of mass or probability on a bounded interval, the collection of all the moments (of all orders, from $0$ to $\infty$ ) uniquely determines the distribution (Hausdorff moment problem). The same is not true on unbounded intervals (Hamburger moment problem).

In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the moments of random variables.^[1]

Significance of the moments

The $n$ -th raw moment (i.e., moment about zero) of a random variable $X$ with density function $f(x)$ is defined by^[2] $\mu '_{n}=\langle X^{n}\rangle ~{\overset {\mathrm {def} }{=}}~{\begin{cases}\sum _{i}x_{i}^{n}f(x_{i}),&{\text{discrete distribution}}\\[1.2ex]\int x^{n}f(x)\,dx,&{\text{continuous distribution}}\end{cases}}$ The $n$ -th moment of a real-valued continuous random variable with density function $f(x)$ about a value $c$ is the integral $\mu _{n}=\int _{-\infty }^{\infty }(x-c)^{n}\,f(x)\,\mathrm {d} x.$

It is possible to define moments for random variables in a more general fashion than moments for real-valued functions — see moments in metric spaces. The moment of a function, without further explanation, usually refers to the above expression with $c=0$ . For the second and higher moments, the central moment (moments about the mean, with c being the mean) are usually used rather than the moments about zero, because they provide clearer information about the distribution's shape.

Other moments may also be defined. For example, the $n$ th inverse moment about zero is $\operatorname {E} \left[X^{-n}\right]$ and the $n$ -th logarithmic moment about zero is $\operatorname {E} \left[\ln ^{n}(X)\right].$

The $n$ -th moment about zero of a probability density function $f(x)$ is the expected value of $X^{n}$ and is called a raw moment or crude moment.^[3] The moments about its mean $\mu$ are called central moments; these describe the shape of the function, independently of translation.

If $f$ is a probability density function, then the value of the integral above is called the $n$ -th moment of the probability distribution. More generally, if F is a cumulative probability distribution function of any probability distribution, which may not have a density function, then the $n$ -th moment of the probability distribution is given by the Riemann–Stieltjes integral $\mu '_{n}=\operatorname {E} \left[X^{n}\right]=\int _{-\infty }^{\infty }x^{n}\,\mathrm {d} F(x)$ where X is a random variable that has this cumulative distribution F, and $E$ is the expectation operator or mean. When $\operatorname {E} \left[\left|X^{n}\right|\right]=\int _{-\infty }^{\infty }\left|x^{n}\right|\,\mathrm {d} F(x)=\infty$ the moment is said not to exist. If the $n$ -th moment about any point exists, so does the $(n - 1)$ -th moment (and thus, all lower-order moments) about every point. The zeroth moment of any probability density function is 1, since the area under any probability density function must be equal to one.

Significance of moments (raw, central, standardised) and cumulants (raw, normalised), in connection with named properties of distributions
Moment ordinal	Moment			Cumulant
Moment ordinal	Raw	Central	Standardized	Raw	Normalized
1	Mean	0	0	Mean	—
2	–	Variance	1	Variance	1
3	–	–	Skewness	–	Skewness
4	–	–	(Non-excess or historical) kurtosis	–	Excess kurtosis
5	–	–	Hyperskewness	–	–
6	–	–	Hypertailedness	–	–
7+	–	–	–	–	–

Standardized moments

The normalised $n$ -th central moment or standardised moment is the $n$ -th central moment divided by $σ n$ ; the normalised $n$ -th central moment of the random variable $X$ is ${\frac {\mu _{n}}{\sigma ^{n}}}={\frac {\operatorname {E} \left[(X-\mu )^{n}\right]}{\sigma ^{n}}}={\frac {\operatorname {E} \left[(X-\mu )^{n}\right]}{\operatorname {E} \left[(X-\mu )^{2}\right]^{\frac {n}{2}}}}.$

These normalised central moments are dimensionless quantities, which represent the distribution independently of any linear change of scale.

Notable moments

Mean

The first raw moment is the mean, usually denoted $\mu \equiv \operatorname {E} [X].$

Variance

The second central moment is the variance. The positive square root of the variance is the standard deviation $\sigma \equiv \left(\operatorname {E} \left[(x-\mu )^{2}\right]\right)^{\frac {1}{2}}.$

Skewness

The third central moment is the measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalised third central moment is called the skewness, often $γ$ . A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

For distributions that are not too different from the normal distribution, the median will be somewhere near $μ - γσ /6$ ; the mode about $μ - γσ /2$ .

Kurtosis

The fourth central moment is a measure of the heaviness of the tail of the distribution. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always nonnegative; and except for a point distribution, it is always strictly positive. The fourth central moment of a normal distribution is $3 σ 4$ .

The kurtosis $κ$ is defined to be the standardized fourth central moment. (Equivalently, as in the next section, excess kurtosis is the fourth cumulant divided by the square of the second cumulant.)^[4]^[5] If a distribution has heavy tails, the kurtosis will be high (sometimes called leptokurtic); conversely, light-tailed distributions (for example, bounded distributions such as the uniform) have low kurtosis (sometimes called platykurtic).

The kurtosis can be positive without limit, but $κ$ must be greater than or equal to $γ 2 + 1$ ; equality only holds for binary distributions. For unbounded skew distributions not too far from normal, $κ$ tends to be somewhere in the area of $γ 2$ and $2 γ 2$ .

The inequality can be proven by considering $\operatorname {E} \left[\left(T^{2}-aT-1\right)^{2}\right]$ where $T = (X - μ)/ σ$ . This is the expectation of a square, so it is non-negative for all a; however it is also a quadratic polynomial in a. Its discriminant must be non-positive, which gives the required relationship.

Higher moments

High-order moments are moments beyond 4th-order moments.

As with variance, skewness, and kurtosis, these are higher-order statistics, involving non-linear combinations of the data, and can be used for description or estimation of further shape parameters. The higher the moment, the harder it is to estimate, in the sense that larger samples are required in order to obtain estimates of similar quality. This is due to the excess degrees of freedom consumed by the higher orders. Further, they can be subtle to interpret, often being most easily understood in terms of lower order moments – compare the higher-order derivatives of jerk and jounce in physics. For example, just as the 4th-order moment (kurtosis) can be interpreted as "relative importance of tails as compared to shoulders in contribution to dispersion" (for a given amount of dispersion, higher kurtosis corresponds to thicker tails, while lower kurtosis corresponds to broader shoulders), the 5th-order moment can be interpreted as measuring "relative importance of tails as compared to center (mode and shoulders) in contribution to skewness" (for a given amount of skewness, higher 5th moment corresponds to higher skewness in the tail portions and little skewness of mode, while lower 5th moment corresponds to more skewness in shoulders).

Mixed moments

Mixed moments are moments involving multiple variables.

The value $E[X^{k}]$ is called the moment of order $k$ (moments are also defined for non-integral $k$ ). The moments of the joint distribution of random variables $X_{1}...X_{n}$ are defined similarly. For any integers $k_{i}\geq 0$ , the mathematical expectation $E[{X_{1}}^{k_{1}}\cdots {X_{n}}^{k_{n}}]$ is called a mixed moment of order $k$ (where $k=k_{1}+...+k_{n}$ ), and $E[(X_{1}-E[X_{1}])^{k_{1}}\cdots (X_{n}-E[X_{n}])^{k_{n}}]$ is called a central mixed moment of order $k$ . The mixed moment $E[(X_{1}-E[X_{1}])(X_{2}-E[X_{2}])]$ is called the covariance and is one of the basic characteristics of dependency between random variables.

Some examples are covariance, coskewness and cokurtosis. While there is a unique covariance, there are multiple co-skewnesses and co-kurtoses.

Properties of moments

Transformation of center

Since $(x-b)^{n}=(x-a+a-b)^{n}=\sum _{i=0}^{n}{n \choose i}(x-a)^{i}(a-b)^{n-i}$ where ${\textstyle {\binom {n}{i}}}$ is the binomial coefficient, it follows that the moments about b can be calculated from the moments about a by: $E\left[(x-b)^{n}\right]=\sum _{i=0}^{n}{n \choose i}E\left[(x-a)^{i}\right](a-b)^{n-i}.$

The moment of a convolution of function

The raw moment of a convolution ${\textstyle h(t)=(f*g)(t)=\int _{-\infty }^{\infty }f(\tau )g(t-\tau )\,d\tau }$ reads $\mu _{n}[h]=\sum _{i=0}^{n}{n \choose i}\mu _{i}[f]\mu _{n-i}[g]$ where $\mu _{n}[\,\cdot \,]$ denotes the $n$ -th moment of the function given in the brackets. This identity follows by the convolution theorem for moment generating function and applying the chain rule for differentiating a product.

Cumulants

The first raw moment and the second and third unnormalized central moments are additive in the sense that if X and Y are independent random variables then ${\begin{aligned}m_{1}(X+Y)&=m_{1}(X)+m_{1}(Y)\\\operatorname {Var} (X+Y)&=\operatorname {Var} (X)+\operatorname {Var} (Y)\\\mu _{3}(X+Y)&=\mu _{3}(X)+\mu _{3}(Y)\end{aligned}}$

(These can also hold for variables that satisfy weaker conditions than independence. The first always holds; if the second holds, the variables are called uncorrelated).

In fact, these are the first three cumulants and all cumulants share this additivity property.

Sample moments

For all k, the $k$ -th raw moment of a population can be estimated using the $k$ -th raw sample moment ${\frac {1}{n}}\sum _{i=1}^{n}X_{i}^{k}$ applied to a sample $X 1, ..., X n$ drawn from the population.

It can be shown that the expected value of the raw sample moment is equal to the $k$ -th raw moment of the population, if that moment exists, for any sample size $n$ . It is thus an unbiased estimator. This contrasts with the situation for central moments, whose computation uses up a degree of freedom by using the sample mean. So for example an unbiased estimate of the population variance (the second central moment) is given by ${\frac {1}{n-1}}\sum _{i=1}^{n}\left(X_{i}-{\bar {X}}\right)^{2}$ in which the previous denominator $n$ has been replaced by the degrees of freedom $n - 1$ , and in which ${\bar {X}}$ refers to the sample mean. This estimate of the population moment is greater than the unadjusted observed sample moment by a factor of ${\tfrac {n}{n-1}},$ and it is referred to as the "adjusted sample variance" or sometimes simply the "sample variance".

Problem of moments

Problems of determining a probability distribution from its sequence of moments are called problem of moments. Such problems were first discussed by P.L. Chebyshev (1874)^[6] in connection with research on limit theorems. In order that the probability distribution of a random variable $X$ be uniquely defined by its moments $\alpha _{k}=E\left[X^{k}\right]$ it is sufficient, for example, that Carleman's condition be satisfied: $\sum _{k=1}^{\infty }{\frac {1}{\alpha _{2k}^{1/2k}}}=\infty$ A similar result even holds for moments of random vectors. The problem of moments seeks characterizations of sequences ${{\mu _{n}}':n=1,2,3,\dots }$ that are sequences of moments of some function f, all moments $\alpha _{k}(n)$ of which are finite, and for each integer $k\geq 1$ let $\alpha _{k}(n)\rightarrow \alpha _{k},n\rightarrow \infty ,$ where $\alpha _{k}$ is finite. Then there is a sequence ${\mu _{n}}'$ that weakly converges to a distribution function $\mu$ having $\alpha _{k}$ as its moments. If the moments determine $\mu$ uniquely, then the sequence ${\mu _{n}}'$ weakly converges to $\mu$ .

Partial moments

Partial moments are sometimes referred to as "one-sided moments." The $n$ -th order lower and upper partial moments with respect to a reference point r may be expressed as

$\mu _{n}^{-}(r)=\int _{-\infty }^{r}(r-x)^{n}\,f(x)\,\mathrm {d} x,$ $\mu _{n}^{+}(r)=\int _{r}^{\infty }(x-r)^{n}\,f(x)\,\mathrm {d} x.$

If the integral function does not converge, the partial moment does not exist.

Partial moments are normalized by being raised to the power 1/n. The upside potential ratio may be expressed as a ratio of a first-order upper partial moment to a normalized second-order lower partial moment.

Central moments in metric spaces

Let $(M, d)$ be a metric space, and let B(M) be the Borel $σ$ -algebra on M, the $σ$ -algebra generated by the d-open subsets of M. (For technical reasons, it is also convenient to assume that M is a separable space with respect to the metric d.) Let $1 \leq p \leq \infty$ .

The $p$ -th central moment of a measure $μ$ on the measurable space (M, B(M)) about a given point $x 0 \in M$ is defined to be $\int _{M}d\left(x,x_{0}\right)^{p}\,\mathrm {d} \mu (x).$

μ is said to have finite $p$ -th central moment if the $p$ -th central moment of $μ$ about x₀ is finite for some $x 0 \in M$ .

This terminology for measures carries over to random variables in the usual way: if $(Ω, Σ, P)$ is a probability space and $X : Ω \to M$ is a random variable, then the $p$ -th central moment of X about $x 0 \in M$ is defined to be $\int _{M}d\left(x,x_{0}\right)^{p}\,\mathrm {d} \left(X_{*}\left(\mathbf {P} \right)\right)(x)=\int _{\Omega }d\left(X(\omega ),x_{0}\right)^{p}\,\mathrm {d} \mathbf {P} (\omega )=\operatorname {\mathbf {E} } [d(X,x_{0})^{p}],$ and X has finite $p$ -th central moment if the $p$ -th central moment of X about x₀ is finite for some $x 0 \in M$ .

Related Research Articles

In probability theory and statistics, kurtosis refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtosis provides insight into specific characteristics of a distribution. Various methods exist for quantifying kurtosis in theoretical distributions, and corresponding techniques allow estimation based on sample data from a population. It’s important to note that different measures of kurtosis can yield varying interpretations.

In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random variable from the mean. The various moments form one set of values by which the properties of a probability distribution can be usefully characterized. Central moments are used in preference to ordinary moments, computed in terms of deviations from the mean instead of from zero, because the higher-order central moments relate only to the spread and shape of the distribution, rather than also to its location.

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes-no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q . It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have$

In probability theory and statistics, a standardized moment of a probability distribution is a moment that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant. The shape of different probability distributions can be compared using standardized moments.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In probability theory and statistics, the cumulants $κ n$ of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be random vectors, in which case the mixture distribution is a multivariate distribution.

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

The Skellam distribution is the discrete probability distribution of the difference $of two statistically independent random variables and each Poisson-distributed with respective expected values and . It is useful in describing the statistics of the difference of two images with simple photon noise, as well as describing the point spread distribution in sports where all scored points are equal, such as baseball, hockey and soccer.$

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.

In probability theory, the first-order second-moment (FOSM) method, also referenced as mean value first-order second-moment (MVFOSM) method, is a probabilistic method to determine the stochastic moments of a function with random input variables. The name is based on the derivation, which uses a first-order Taylor series and the first and second moments of the input variables.

<span class="mw-page-title-main">Neyman Type A distribution</span> Compound Poisson-family discrete probability distribution

In statistics and probability, the Neyman Type A distribution is a discrete probability distribution from the family of Compound Poisson distribution. First of all, to easily understand this distribution we will demonstrate it with the following example explained in Univariate Discret Distributions; we have a statistical model of the distribution of larvae in a unit area of field by assuming that the variation in the number of clusters of eggs per unit area could be represented by a Poisson distribution with parameter $, while the number of larvae developing per cluster of eggs are assumed to have independent Poisson distribution all with the same parameter . If we want to know how many larvae there are, we define a random variable Y as the sum of the number of larvae hatched in each group. Therefore, Y = X 1 + X 2 + ... X j, where X 1,..., X j are independent Poisson variables with parameter and .$

References

Text was copied from Moment at the Encyclopedia of Mathematics, which is released under a Creative Commons Attribution-Share Alike 3.0 (Unported) (CC-BY-SA 3.0) license and the GNU Free Documentation License.

↑ George Mackey (July 1980). "HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY". Bulletin of the American Mathematical Society. New Series. 3 (1): 549.
↑ Papoulis, A. (1984). Probability, Random Variables, and Stochastic Processes, 2nd ed. New York: McGraw Hill. pp. 145–149.
↑ "Raw Moment -- from Wolfram MathWorld". Archived from the original on 2009-05-28. Retrieved 2009-06-24. Raw Moments at Math-world
↑ Casella, George; Berger, Roger L. (2002). Statistical Inference (2 ed.). Pacific Grove: Duxbury. ISBN 0-534-24312-6.
↑ Ballanda, Kevin P.; MacGillivray, H. L. (1988). "Kurtosis: A Critical Review". The American Statistician. 42 (2). American Statistical Association: 111–119. doi:10.2307/2684482. JSTOR 2684482.
↑ Feller, W. (1957-1971). An introduction to probability theory and its applications. New York: John Wiley & Sons. 419 p.

External links

"Moment", Encyclopedia of Mathematics , EMS Press, 2001 [1994]
Moments at Mathworld

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] George Mackey (July 1980). "HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY". Bulletin of the American Mathematical Society. New Series. 3 (1): 549.

[2] Papoulis, A. (1984). Probability, Random Variables, and Stochastic Processes, 2nd ed. New York: McGraw Hill. pp. 145–149.

[3] "Raw Moment -- from Wolfram MathWorld". Archived from the original on 2009-05-28. Retrieved 2009-06-24. Raw Moments at Math-world

[CasellaBerger-4] Casella, George; Berger, Roger L. (2002). Statistical Inference (2 ed.). Pacific Grove: Duxbury. ISBN 0-534-24312-6.

[BalandaMacGillivray88-5] Ballanda, Kevin P.; MacGillivray, H. L. (1988). "Kurtosis: A Critical Review". The American Statistician. 42 (2). American Statistical Association: 111–119. doi:10.2307/2684482. JSTOR 2684482.

[6] Feller, W. (1957-1971). An introduction to probability theory and its applications. New York: John Wiley & Sons. 419 p.

[1]

[2]

[3]

[4]

[5]

[6]

v t e Theory of probability distributions
probability mass function (pmf) probability density function (pdf) cumulative distribution function (cdf) quantile function
raw moment central moment mean variance standard deviation skewness kurtosis L-moment
moment-generating function (mgf) characteristic function probability-generating function (pgf) cumulant combinant