Heavy-tailed distribution

Last updated

In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: [1] that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.


There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions, and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class, introduced by Jozef Teugels. [2]

There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite variance. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally considered to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)


Definition of heavy-tailed distribution

The distribution of a random variable X with distribution function F is said to have a heavy (right) tail if the moment generating function of X, MX(t), is infinite for all t > 0. [3]

That means


This is also written in terms of the tail distribution function


Definition of long-tailed distribution

The distribution of a random variable X with distribution function F is said to have a long right tail [1] if for all t > 0,

or equivalently

This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level.

All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.

Subexponential distributions

Subexponentiality is defined in terms of convolutions of probability distributions. For two independent, identically distributed random variables with a common distribution function , the convolution of with itself, written and called the convolution square, is defined using Lebesgue–Stieltjes integration by:

and the n-fold convolution is defined inductively by the rule:

The tail distribution function is defined as .

A distribution on the positive half-line is subexponential [1] [5] [2] if

This implies [6] that, for any ,

The probabilistic interpretation [6] of this is that, for a sum of independent random variables with common distribution ,

This is often known as the principle of the single big jump [7] or catastrophe principle. [8]

A distribution on the whole real line is subexponential if the distribution is. [9] Here is the indicator function of the positive half-line. Alternatively, a random variable supported on the real line is subexponential if and only if is subexponential.

All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.

Common heavy-tailed distributions

All commonly used heavy-tailed distributions are subexponential. [6]

Those that are one-tailed include:

Those that are two-tailed include:

Relationship to fat-tailed distributions

A fat-tailed distribution is a distribution for which the probability density function, for large x, goes to zero as a power . Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the log-normal distribution [ contradictory ]. Many other heavy-tailed distributions such as the log-logistic and Pareto distribution are, however, also fat-tailed.

Estimating the tail-index

There are parametric [6] and non-parametric [14] approaches to the problem of the tail-index estimation.[ when defined as? ]

To estimate the tail-index using the parametric approach, some authors employ GEV distribution or Pareto distribution; they may apply the maximum-likelihood estimator (MLE).

Pickand's tail-index estimator

With a random sequence of independent and same density function , the Maximum Attraction Domain [15] of the generalized extreme value density , where . If and , then the Pickands tail-index estimation is [6] [15]

where . This estimator converges in probability to .

Hill's tail-index estimator

Let be a sequence of independent and identically distributed random variables with distribution function , the maximum domain of attraction of the generalized extreme value distribution , where . The sample path is where is the sample size. If is an intermediate order sequence, i.e. , and , then the Hill tail-index estimator is [16]

where is the -th order statistic of . This estimator converges in probability to , and is asymptotically normal provided is restricted based on a higher order regular variation property [17] . [18] Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences, [19] [20] irrespective of whether is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent. [21] [22] [23] Note that both Pickand's and Hill's tail-index estimators commonly make use of logarithm of the order statistics. [24]

Ratio estimator of the tail-index

The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie and Smith. [25] It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".

A comparison of Hill-type and RE-type estimators can be found in Novak. [14]


Estimation of heavy-tailed density

Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in Markovich. [27] These are approaches based on variable bandwidth and long-tailed kernel estimators; on the preliminary data transform to a new random variable at finite or infinite intervals, which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications, methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds. [28] A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in. [27] Bootstrap is another tool to find smoothing parameters using approximations of unknown MSE by different schemes of re-samples selection, see e.g. [29]

See also

Related Research Articles

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Cauchy distribution</span> Probability distribution

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.

<span class="mw-page-title-main">Probability theory</span> Branch of mathematics concerning probability

Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes . Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability theory describing such behaviour are the law of large numbers and the central limit theorem.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Power law</span> Functional relationship between two quantities

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Extreme value theory</span>

Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, such as the 100-year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly.

<span class="mw-page-title-main">Pareto distribution</span> Probability distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.
<span class="mw-page-title-main">Stable distribution</span> Distribution of variables which satisfies a stability property under linear combinations

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

von Mises distribution Probability distribution on the circle

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the N-dimensional sphere.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

<span class="mw-page-title-main">Generalized Pareto distribution</span> Family of probability distributions often used to model tails or extreme values

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location , scale , and shape . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as .

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

In statistics, the Fisher–Tippett–Gnedenko theorem is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of 3 possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution. Credit for the extreme value theorem and its convergence details are given to Fréchet (1927), Fisher and Tippett (1928), Mises (1936) and Gnedenko (1943).

The Pickands–Balkema–De Haan theorem gives the asymptotic tail distribution of a random variable, when its true distribution is unknown. It is often called the second theorem in extreme value theory. Unlike the first theorem, which concerns the maximum of a sample, the Pickands–Balkema–De Haan theorem describes the values above a threshold.


  1. 1 2 3 Asmussen, S. R. (2003). "Steady-State Properties of GI/G/1". Applied Probability and Queues. Stochastic Modelling and Applied Probability. Vol. 51. pp. 266–301. doi:10.1007/0-387-21525-5_10. ISBN   978-0-387-00211-8.
  2. 1 2 Teugels, Jozef L. (1975). "The Class of Subexponential Distributions". Annals of Probability. University of Louvain. 3 (6). doi: 10.1214/aop/1176996225 . Retrieved April 7, 2019.
  3. Rolski, Schmidli, Scmidt, Teugels, Stochastic Processes for Insurance and Finance, 1999
  4. S. Foss, D. Korshunov, S. Zachary, An Introduction to Heavy-Tailed and Subexponential Distributions, Springer Science & Business Media, 21 May 2013
  5. Chistyakov, V. P. (1964). "A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes". ResearchGate. Retrieved April 7, 2019.
  6. 1 2 3 4 5 Embrechts P.; Klueppelberg C.; Mikosch T. (1997). Modelling extremal events for insurance and finance. Stochastic Modelling and Applied Probability. Vol. 33. Berlin: Springer. doi:10.1007/978-3-642-33483-2. ISBN   978-3-642-08242-9.
  7. Foss, S.; Konstantopoulos, T.; Zachary, S. (2007). "Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments" (PDF). Journal of Theoretical Probability. 20 (3): 581. arXiv: math/0509605 . CiteSeerX . doi:10.1007/s10959-007-0081-2. S2CID   3047753.
  8. Wierman, Adam (January 9, 2014). "Catastrophes, Conspiracies, and Subexponential Distributions (Part III)". Rigor + Relevance blog. RSRG, Caltech. Retrieved January 9, 2014.
  9. Willekens, E. (1986). "Subexponentiality on the real line". Technical Report. K.U. Leuven.
  10. Falk, M., Hüsler, J. & Reiss, R. (2010). Laws of Small Numbers: Extremes and Rare Events. Springer. p. 80. ISBN   978-3-0348-0008-2.{{cite book}}: CS1 maint: multiple names: authors list (link)
  11. Alves, M.I.F., de Haan, L. & Neves, C. (March 10, 2006). "Statistical inference for heavy and super-heavy tailed distributions" (PDF). Archived from the original (PDF) on June 23, 2007. Retrieved November 1, 2011.{{cite web}}: CS1 maint: multiple names: authors list (link)
  12. John P. Nolan (2009). "Stable Distributions: Models for Heavy Tailed Data" (PDF). Archived from the original (PDF) on 2011-07-17. Retrieved 2009-02-21.
  13. Stephen Lihn (2009). "Skew Lognormal Cascade Distribution". Archived from the original on 2014-04-07. Retrieved 2009-06-12.
  14. 1 2 Novak S.Y. (2011). Extreme value methods with applications to finance. London: CRC. ISBN   978-1-43983-574-6.
  15. 1 2 Pickands III, James (Jan 1975). "Statistical Inference Using Extreme Order Statistics". The Annals of Statistics. 3 (1): 119–131. doi: 10.1214/aos/1176343003 . JSTOR   2958083.
  16. Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., v. 3, 1163–1174.
  17. Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.
  18. Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.
  19. Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.
  20. Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.
  21. Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.
  22. Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.
  23. Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.
  24. Lee, Seyoon; Kim, Joseph H. T. (2019). "Exponentiated generalized Pareto distribution: Properties and applications towards extreme value theory". Communications in Statistics - Theory and Methods. 48 (8): 2014–2038. arXiv: 1708.01686 . doi:10.1080/03610926.2018.1441418. S2CID   88514574.
  25. Goldie C.M., Smith R.L. (1987) Slow variation with remainder: theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.
  26. Crovella, M. E.; Taqqu, M. S. (1999). "Estimating the Heavy Tail Index from Scaling Properties". Methodology and Computing in Applied Probability. 1: 55–79. doi:10.1023/A:1010012224103. S2CID   8917289.
  27. 1 2 Markovich N.M. (2007). Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice. Chitester: Wiley. ISBN   978-0-470-72359-3.
  28. Wand M.P., Jones M.C. (1995). Kernel smoothing. New York: Chapman and Hall. ISBN   978-0412552700.
  29. Hall P. (1992). The Bootstrap and Edgeworth Expansion. Springer. ISBN   9780387945088.