# L-moment

Last updated

In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. [1] [2] [3] [4] They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively (the L-mean is identical to the conventional mean). Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

## Population L-moments

For a random variable X, the rth population L-moment is [1]

${\displaystyle \lambda _{r}=r^{-1}\sum _{k=0}^{r-1}{(-1)^{k}{\binom {r-1}{k}}\mathrm {E} X_{r-k:r}},}$

where Xk:n denotes the kth order statistic (kth smallest value) in an independent sample of size n from the distribution of X and ${\displaystyle \mathrm {E} }$ denotes expected value. In particular, the first four population L-moments are

${\displaystyle \lambda _{1}=\mathrm {E} X}$
${\displaystyle \lambda _{2}=(\mathrm {E} X_{2:2}-\mathrm {E} X_{1:2})/2}$
${\displaystyle \lambda _{3}=(\mathrm {E} X_{3:3}-2\mathrm {E} X_{2:3}+\mathrm {E} X_{1:3})/3}$
${\displaystyle \lambda _{4}=(\mathrm {E} X_{4:4}-3\mathrm {E} X_{3:4}+3\mathrm {E} X_{2:4}-\mathrm {E} X_{1:4})/4.}$

Note that the coefficients of the k-th L-moment are the same as in the k-th term of the binomial transform, as used in the k-order finite difference (finite analog to the derivative).

The first two of these L-moments have conventional names:

${\displaystyle \lambda _{1}={\text{mean, L-mean or L-location}},}$
${\displaystyle \lambda _{2}={\text{L-scale}}.}$

The L-scale is equal to half the Mean absolute difference. [5]

## Sample L-moments

The sample L-moments can be computed as the population L-moments of the sample, summing over r-element subsets of the sample ${\displaystyle \left\{x_{1}<\cdots hence averaging by dividing by the binomial coefficient:

${\displaystyle \lambda _{r}=r^{-1}{\tbinom {n}{r}}^{-1}\sum _{x_{1}<\cdots

Grouping these by order statistic counts the number of ways an element of an n-element sample can be the jth element of an r-element subset, and yields formulas of the form below. Direct estimators for the first four L-moments in a finite sample of n observations are: [6]

${\displaystyle \ell _{1}={\tbinom {n}{1}}^{-1}\sum _{i=1}^{n}x_{(i)}}$
${\displaystyle \ell _{2}={\tfrac {1}{2}}{\tbinom {n}{2}}^{-1}\sum _{i=1}^{n}\left\{{\tbinom {i-1}{1}}-{\tbinom {n-i}{1}}\right\}x_{(i)}}$
${\displaystyle \ell _{3}={\tfrac {1}{3}}{\tbinom {n}{3}}^{-1}\sum _{i=1}^{n}\left\{{\tbinom {i-1}{2}}-2{\tbinom {i-1}{1}}{\tbinom {n-i}{1}}+{\tbinom {n-i}{2}}\right\}x_{(i)}}$
${\displaystyle \ell _{4}={\tfrac {1}{4}}{\tbinom {n}{4}}^{-1}\sum _{i=1}^{n}\left\{{\tbinom {i-1}{3}}-3{\tbinom {i-1}{2}}{\tbinom {n-i}{1}}+3{\tbinom {i-1}{1}}{\tbinom {n-i}{2}}-{\tbinom {n-i}{3}}\right\}x_{(i)}}$

where x(i) is the ith order statistic and ${\displaystyle {\tbinom {\cdot }{\cdot }}}$ is a binomial coefficient. Sample L-moments can also be defined indirectly in terms of probability weighted moments, [1] [7] [8] which leads to a more efficient algorithm for their computation. [6] [9]

## L-moment ratios

A set of L-moment ratios, or scaled L-moments, is defined by

${\displaystyle \tau _{r}=\lambda _{r}/\lambda _{2},\qquad r=3,4,\dots .}$

The most useful of these are ${\displaystyle \tau _{3}}$, called the L-skewness, and ${\displaystyle \tau _{4}}$, the L-kurtosis.

L-moment ratios lie within the interval (–1, 1). Tighter bounds can be found for some specific L-moment ratios; in particular, the L-kurtosis ${\displaystyle \tau _{4}}$ lies in [-¼,1), and

${\displaystyle {\tfrac {1}{4}}(5\tau _{3}^{2}-1)\leq \tau _{4}<1.}$ [1]

A quantity analogous to the coefficient of variation, but based on L-moments, can also be defined: ${\displaystyle \tau =\lambda _{2}/\lambda _{1},}$ which is called the "coefficient of L-variation", or "L-CV". For a non-negative random variable, this lies in the interval (0,1) [1] and is identical to the Gini coefficient. [10]

L-moments are statistical quantities that are derived from probability weighted moments [11] (PWM) which were defined earlier (1979). [7] PWM are used to efficiently estimate the parameters of distributions expressable in inverse form such as the Gumbel, [8] the Tukey, and the Wakeby distributions.

## Usage

There are two common ways that L-moments are used, in both cases analogously to the conventional moments:

1. As summary statistics for data.
2. To derive estimators for the parameters of probability distributions, applying the method of moments to the L-moments rather than conventional moments.

In addition to doing these with standard moments, the latter (estimation) is more commonly done using maximum likelihood methods; however using L-moments provides a number of advantages. Specifically, L-moments are more robust than conventional moments, and existence of higher L-moments only requires that the random variable have finite mean. One disadvantage of L-moment ratios for estimation is their typically smaller sensitivity. For instance, the Laplace distribution has a kurtosis of 6 and weak exponential tails, but a larger 4th L-moment ratio than e.g. the student-t distribution with d.f.=3, which has an infinite kurtosis and much heavier tails.

As an example consider a dataset with a few data points and one outlying data value. If the ordinary standard deviation of this data set is taken it will be highly influenced by this one point: however, if the L-scale is taken it will be far less sensitive to this data value. Consequently, L-moments are far more meaningful when dealing with outliers in data than conventional moments. However, there are also other better suited methods to achieve an even higher robustness than just replacing moments by L-moments. One example of this is using L-moments as summary statistics in extreme value theory  (EVT). This application shows the limited robustness of L-moments, i.e. L-statistics are not resistant statistics, as a single extreme value can throw them off, but because they are only linear (not higher-order statistics), they are less affected by extreme values than conventional moments.

Another advantage L-moments have over conventional moments is that their existence only requires the random variable to have finite mean, so the L-moments exist even if the higher conventional moments do not exist (for example, for Student's t distribution with low degrees of freedom). A finite variance is required in addition in order for the standard errors of estimates of the L-moments to be finite. [1]

Some appearances of L-moments in the statistical literature include the book by David & Nagaraja (2003, Section 9.9) [12] and a number of papers. [10] [13] [14] [15] [16] [17] A number of favourable comparisons of L-moments with ordinary moments have been reported. [18] [19]

## Values for some common distributions

The table below gives expressions for the first two L-moments and numerical values of the first two L-moment ratios of some common continuous probability distributions with constant L-moment ratios. [1] [5] More complex expressions have been derived for some further distributions for which the L-moment ratios vary with one or more of the distributional parameters, including the log-normal, Gamma, generalized Pareto, generalized extreme value, and generalized logistic distributions. [1]

DistributionParametersmean, λ1L-scale, λ2L-skewness, τ3L-kurtosis, τ4
Uniform a, b(a+b) / 2(ba) / 600
Logistic μ, sμs016 = 0.1667
Normal μ, σ2μσ / π00.1226
Laplace μ, bμ3b / 401 / (32) = 0.2357
Student's t, 2 d.f. ν = 20π/23/2 = 1.111038 = 0.375
Student's t, 4 d.f. ν = 4015π/64 = 0.73630111/512 = 0.2168
Exponential λ1 / λ1 / (2λ)13 = 0.333316 = 0.1667
Gumbel μ, βμ + γββ log 20.16990.1504

The notation for the parameters of each distribution is the same as that used in the linked article. In the expression for the mean of the Gumbel distribution, γ is the Euler–Mascheroni constant 0.57721... .

## Extensions

Trimmed L-moments are generalizations of L-moments that give zero weight to extreme observations. They are therefore more robust to the presence of outliers, and unlike L-moments they may be well-defined for distributions for which the mean does not exist, such as the Cauchy distribution. [20]

## Related Research Articles

In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes the shape of a probability distribution and there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Different measures of kurtosis may have different interpretations.

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In probability theory and statistics, a standardized moment of a probability distribution is a moment that is normalized. The normalization is typically a division by an expression of the standard deviation which renders the moment scale invariant. This has the advantage that such normalized moments differ only in other properties than variability, facilitating e.g. comparison of shape of different probability distributions.

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by alpha (α) and beta (β), that appear as exponents of the random variable and control the shape of the distribution. The generalization to multiple variables is called a Dirichlet distribution.

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom.

In mathematics, the moments of a function are quantitative measures related to the shape of the function's graph. If the function represents mass, then the first moment is the center of the mass, and the second moment is the rotational inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

Formalized by John Tukey, the Tukey lambda distribution is a continuous, symmetric probability distribution defined in terms of its quantile function. It is typically used to identify an appropriate distribution and not used in statistical models directly.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area or volume.

In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

In probability theory, an exponentially modified Gaussian distribution describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y, where X and Y are independent, X is Gaussian with mean μ and variance σ2, and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.

The Johnson's SU-distribution is a four-parameter family of probability distributions first investigated by N. L. Johnson in 1949. Johnson proposed it as a transformation of the normal distribution:

## References

1. Hosking, J.R.M. (1990). "L-moments: analysis and estimation of distributions using linear combinations of order statistics". Journal of the Royal Statistical Society, Series B. 52 (1): 105–124. JSTOR   2345653.
2. Hosking, J.R.M. (1992). "Moments or L moments? An example comparing two measures of distributional shape". The American Statistician. 46 (3): 186–189. doi:10.2307/2685210. JSTOR   2685210.
3. Hosking, J.R.M. (2006). "On the characterization of distributions by their L-moments". Journal of Statistical Planning and Inference. 136: 193–198. doi:10.1016/j.jspi.2004.06.004.
4. Asquith, W.H. (2011) Distributional analysis with L-moment statistics using the R environment for statistical computing, Create Space Independent Publishing Platform, [print-on-demand], ISBN   1-463-50841-7
5. Jones, M.C. (2002). "Student's Simplest Distribution". Journal of the Royal Statistical Society, Series D . 51 (1): 41–49. doi:10.1111/1467-9884.00297. JSTOR   3650389.
6. Wang, Q. J. (1996). "Direct Sample Estimators of L Moments". Water Resources Research. 32 (12): 3617–3619. doi:10.1029/96WR02675.
7. Greenwood, JA; Landwehr, JM; Matalas, NC; Wallis, JR (1979). "Probability Weighted Moments: Definition and relation to parameters of several distributions expressed in inverse form" (PDF). Water Resources Research. 15 (5): 1049–1054. doi:10.1029/WR015i005p01049. Archived from the original (PDF) on 2020-02-10.
8. Landwehr, JM; Matalas, NC; Wallis, JR (1979). "Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles". Water Resources Research. 15 (5): 1055–1064. doi:10.1029/WR015i005p01055.
9. L Moments, 6 January 2006, retrieved 19 January 2013 NIST Dataplot documentation
10. Valbuena, R.; Maltamo, M.; Mehtätalo, L.; Packalen, P. (2017). "Key structural features of Boreal forests may be detected directly using L-moments from airborne lidar data". Remote Sensing of Environment. 194: 437–446. doi:10.1016/j.rse.2016.10.024.
11. Hosking, JRM; Wallis, JR (2005). Regional Frequency Analysis: An Approach Based on L-moments. Cambridge University Press. p. 3. ISBN   978-0521019408 . Retrieved 22 January 2013.
12. David, H. A.; Nagaraja, H. N. (2003). Order Statistics (3rd ed.). Wiley. ISBN   978-0-471-38926-2.
13. Serfling, R.; Xiao, P. (2007). "A contribution to multivariate L-moments: L-comoment matrices". Journal of Multivariate Analysis. 98 (9): 1765–1781. CiteSeerX  . doi:10.1016/j.jmva.2007.01.008.
14. Delicado, P.; Goria, M. N. (2008). "A small sample comparison of maximum likelihood, moments and L-moments methods for the asymmetric exponential power distribution". Computational Statistics & Data Analysis. 52 (3): 1661–1673. doi:10.1016/j.csda.2007.05.021.
15. Alkasasbeh, M. R.; Raqab, M. Z. (2009). "Estimation of the generalized logistic distribution parameters: comparative study". Statistical Methodology. 6 (3): 262–279. doi:10.1016/j.stamet.2008.10.001.
16. Jones, M. C. (2004). "On some expressions for variance, covariance, skewness and L-moments". Journal of Statistical Planning and Inference. 126 (1): 97–106. doi:10.1016/j.jspi.2003.09.001.
17. Jones, M. C. (2009). "Kumaraswamy's distribution: A beta-type distribution with some tractability advantages". Statistical Methodology. 6 (1): 70–81. doi:10.1016/j.stamet.2008.04.001.
18. Royston, P. (1992). "Which measures of skewness and kurtosis are best?". Statistics in Medicine . 11 (3): 333–343. doi:10.1002/sim.4780110306.
19. Ulrych, T. J.; Velis, D. R.; Woodbury, A. D.; Sacchi, M. D. (2000). "L-moments and C-moments". Stochastic Environmental Research and Risk Assessment. 14 (1): 50–68. doi:10.1007/s004770050004.
20. Elamir, Elsayed A. H.; Seheult, Allan H. (2003). "Trimmed L-moments". Computational Statistics & Data Analysis. 43 (3): 299–314. doi:10.1016/S0167-9473(02)00250-5.