In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. [1] [2] [3] [4] They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively (the L-mean is identical to the conventional mean). Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.
For a random variable X, the rth population L-moment is [1]
where Xk:n denotes the kth order statistic (kth smallest value) in an independent sample of size n from the distribution of X and denotes expected value. In particular, the first four population L-moments are
Note that the coefficients of the k-th L-moment are the same as in the k-th term of the binomial transform, as used in the k-order finite difference (finite analog to the derivative).
The first two of these L-moments have conventional names:
The L-scale is equal to half the Mean absolute difference. [5]
The sample L-moments can be computed as the population L-moments of the sample, summing over r-element subsets of the sample hence averaging by dividing by the binomial coefficient:
Grouping these by order statistic counts the number of ways an element of an n-element sample can be the jth element of an r-element subset, and yields formulas of the form below. Direct estimators for the first four L-moments in a finite sample of n observations are: [6]
where x(i) is the ith order statistic and is a binomial coefficient. Sample L-moments can also be defined indirectly in terms of probability weighted moments, [1] [7] [8] which leads to a more efficient algorithm for their computation. [6] [9]
A set of L-moment ratios, or scaled L-moments, is defined by
The most useful of these are , called the L-skewness, and , the L-kurtosis.
L-moment ratios lie within the interval (–1, 1). Tighter bounds can be found for some specific L-moment ratios; in particular, the L-kurtosis lies in [-¼,1), and
A quantity analogous to the coefficient of variation, but based on L-moments, can also be defined: which is called the "coefficient of L-variation", or "L-CV". For a non-negative random variable, this lies in the interval (0,1) [1] and is identical to the Gini coefficient. [10]
L-moments are statistical quantities that are derived from probability weighted moments [11] (PWM) which were defined earlier (1979). [7] PWM are used to efficiently estimate the parameters of distributions expressable in inverse form such as the Gumbel, [8] the Tukey, and the Wakeby distributions.
There are two common ways that L-moments are used, in both cases analogously to the conventional moments:
In addition to doing these with standard moments, the latter (estimation) is more commonly done using maximum likelihood methods; however using L-moments provides a number of advantages. Specifically, L-moments are more robust than conventional moments, and existence of higher L-moments only requires that the random variable have finite mean. One disadvantage of L-moment ratios for estimation is their typically smaller sensitivity. For instance, the Laplace distribution has a kurtosis of 6 and weak exponential tails, but a larger 4th L-moment ratio than e.g. the student-t distribution with d.f.=3, which has an infinite kurtosis and much heavier tails.
As an example consider a dataset with a few data points and one outlying data value. If the ordinary standard deviation of this data set is taken it will be highly influenced by this one point: however, if the L-scale is taken it will be far less sensitive to this data value. Consequently, L-moments are far more meaningful when dealing with outliers in data than conventional moments. However, there are also other better suited methods to achieve an even higher robustness than just replacing moments by L-moments. One example of this is using L-moments as summary statistics in extreme value theory (EVT). This application shows the limited robustness of L-moments, i.e. L-statistics are not resistant statistics, as a single extreme value can throw them off, but because they are only linear (not higher-order statistics), they are less affected by extreme values than conventional moments.
Another advantage L-moments have over conventional moments is that their existence only requires the random variable to have finite mean, so the L-moments exist even if the higher conventional moments do not exist (for example, for Student's t distribution with low degrees of freedom). A finite variance is required in addition in order for the standard errors of estimates of the L-moments to be finite. [1]
Some appearances of L-moments in the statistical literature include the book by David & Nagaraja (2003, Section 9.9) [12] and a number of papers. [10] [13] [14] [15] [16] [17] A number of favourable comparisons of L-moments with ordinary moments have been reported. [18] [19]
The table below gives expressions for the first two L-moments and numerical values of the first two L-moment ratios of some common continuous probability distributions with constant L-moment ratios. [1] [5] More complex expressions have been derived for some further distributions for which the L-moment ratios vary with one or more of the distributional parameters, including the log-normal, Gamma, generalized Pareto, generalized extreme value, and generalized logistic distributions. [1]
Distribution | Parameters | mean, λ1 | L-scale, λ2 | L-skewness, τ3 | L-kurtosis, τ4 |
---|---|---|---|---|---|
Uniform | a, b | (a+b) / 2 | (b–a) / 6 | 0 | 0 |
Logistic | μ, s | μ | s | 0 | 1⁄6 = 0.1667 |
Normal | μ, σ2 | μ | σ / √π | 0 | 0.1226 |
Laplace | μ, b | μ | 3b / 4 | 0 | 1 / (3√2) = 0.2357 |
Student's t, 2 d.f. | ν = 2 | 0 | π/23/2 = 1.111 | 0 | 3⁄8 = 0.375 |
Student's t, 4 d.f. | ν = 4 | 0 | 15π/64 = 0.7363 | 0 | 111/512 = 0.2168 |
Exponential | λ | 1 / λ | 1 / (2λ) | 1⁄3 = 0.3333 | 1⁄6 = 0.1667 |
Gumbel | μ, β | μ + γβ | β log 2 | 0.1699 | 0.1504 |
The notation for the parameters of each distribution is the same as that used in the linked article. In the expression for the mean of the Gumbel distribution, γ is the Euler–Mascheroni constant 0.57721... .
Trimmed L-moments are generalizations of L-moments that give zero weight to extreme observations. They are therefore more robust to the presence of outliers, and unlike L-moments they may be well-defined for distributions for which the mean does not exist, such as the Cauchy distribution. [20]
In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes the shape of a probability distribution and there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Different measures of kurtosis may have different interpretations.
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.
In probability theory and statistics, a standardized moment of a probability distribution is a moment that is normalized. The normalization is typically a division by an expression of the standard deviation which renders the moment scale invariant. This has the advantage that such normalized moments differ only in other properties than variability, facilitating e.g. comparison of shape of different probability distributions.
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet and first applied by Rosin & Rammler (1933) to describe a particle size distribution.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by alpha (α) and beta (β), that appear as exponents of the random variable and control the shape of the distribution. The generalization to multiple variables is called a Dirichlet distribution.
In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom.
In mathematics, the moments of a function are quantitative measures related to the shape of the function's graph. If the function represents mass, then the first moment is the center of the mass, and the second moment is the rotational inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:
In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.
In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.
Formalized by John Tukey, the Tukey lambda distribution is a continuous, symmetric probability distribution defined in terms of its quantile function. It is typically used to identify an appropriate distribution and not used in statistical models directly.
In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area or volume.
In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.
In probability theory, an exponentially modified Gaussian distribution describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y, where X and Y are independent, X is Gaussian with mean μ and variance σ2, and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.
The Johnson's SU-distribution is a four-parameter family of probability distributions first investigated by N. L. Johnson in 1949. Johnson proposed it as a transformation of the normal distribution: