Split normal distribution

Last updated

In probability theory and statistics, the split normal distribution also known as the two-piece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances. It is claimed by Johnson et al. [1] that this distribution was introduced by Gibbons and Mylroie [2] and by John. [3] But these are two of several independent rediscoveries of the Zweiseitige Gauss'sche Gesetz introduced in the posthumously published Kollektivmasslehre (1897) [4] of Gustav Theodor Fechner (1801-1887), see Wallis (2014). [5] Another rediscovery has appeared more recently in a finance journal. [6]

Contents

Split-normal
Notation
Parameters mode (location, real)
— left-hand-side standard deviation (scale, real)
— right-hand-side standard deviation (scale, real)
Support
PDF



Mean
Mode
Variance
Skewness

Definition

The split normal distribution arises from merging two opposite halves of two probability density functions (PDFs) of normal distributions in their common mode.

The PDF of the split normal distribution is given by [1]

where

Discussion

The split normal distribution results from merging two halves of normal distributions. In a general case the 'parent' normal distributions can have different variances which implies that the joined PDF would not be continuous. To ensure that the resulting PDF integrates to 1, the normalizing constant A is used.

In a special case when the split normal distribution reduces to normal distribution with variance .

When σ2≠σ1 the constant A is different from the constant of normal distribution. However, when the constants are equal.

The sign of its third central moment is determined by the difference (σ21). If this difference is positive, the distribution is skewed to the right and if negative, then it is skewed to the left.

Other properties of the split normal density were discussed by Johnson et al. [1] and Julio. [7]

Alternative formulations

The formulation discussed above originates from John. [3] The literature offers two mathematically equivalent alternative parameterizations . Britton, Fisher and Whitley [8] offer a parameterization if terms of mode, dispersion and normed skewness, denoted with . The parameter μ is the mode and has equivalent to the mode in John's formulation. The parameter σ 2>0 informs about the dispersion (scale) and should not be confused with variance. The third parameter, γ ∈ (-1,1), is the normalized skew.

The second alternative parameterization is used in the Bank of England's communication and is written in terms of mode, dispersion and unnormed skewness and is denoted with . In this formulation the parameter μ is the mode and is identical as in John's [3] and Britton, Fisher and Whitley's [8] formulation. The parameter σ 2 informs about the dispersion (scale) and is the same as in the Britton, Fisher and Whitley's formulation. The parameter ξ equals the difference between the distribution's mean and mode and can be viewed as unnormed measure of skewness.

The three parameterizations are mathematically equivalent, meaning that there is a strict relationship between the parameters and that it is possible to go from one parameterization to another. The following relationships hold: [9]

Multivariate Extensions

The multivariate generalization of the split normal distribution was proposed by Villani and Larsson. [10] They assume that each of the principal components has univariate split normal distribution with a different set of parameters μ, σ2 and σ1.

Estimation of parameters

John [3] proposes to estimate the parameters using maximum likelihood method. He shows that the likelihood function can be expressed in an intensive form, in which the scale parameters σ1 and σ2 are a function of the location parameter μ. The likelihood in its intensive form is:

and has to be maximized numerically with respect to a single parameter μ only.

Given the maximum likelihood estimator the other parameters take values:

where N is the number of observations.

Villani and Larsson [10] propose to use either maximum likelihood method or bayesian estimation and provide some analytical results for either univariate and multivariate case.

Applications

The split normal distribution has been used mainly in econometrics and time series. A remarkable area of application is the construction of the fan chart, a representation of the inflation forecast distribution reported by inflation targeting central banks around the globe. [7] [11]

Related Research Articles

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

<span class="mw-page-title-main">Multimodal distribution</span> Probability distribution whose density has two or more distinct local maxima

In statistics, a multimodaldistribution is a probability distribution with more than one mode. These appear as distinct peaks in the probability density function, as shown in Figures 1 and 2. Categorical, continuous, and discrete data can all form multimodal distributions. Among univariate analyses, multimodal distributions are commonly bimodal.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

<span class="mw-page-title-main">Scaled inverse chi-squared distribution</span> Probability distribution

The scaled inverse chi-squared distribution is the distribution for x = 1/s2, where s2 is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

<span class="mw-page-title-main">Pearson distribution</span> Family of continuous probability distributions

The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

<span class="mw-page-title-main">Folded normal distribution</span> Probability distribution

The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is called "folded" because probability mass to the left of x = 0 is folded over by taking the absolute value. In the physics of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the half space; it corresponds to having a perfect insulator on a hyperplane through the origin.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

<span class="mw-page-title-main">Generalized Pareto distribution</span> Family of probability distributions often used to model tails or extreme values

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location , scale , and shape . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as .

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used extensively in reliability applications to model failure times. There are several alternative formulations of this distribution in the literature. It is named after Z. W. Birnbaum and S. C. Saunders.

<span class="mw-page-title-main">Shifted log-logistic distribution</span>

The shifted log-logistic distribution is a probability distribution also known as the generalized log-logistic or the three-parameter log-logistic distribution. It has also been called the generalized logistic distribution, but this conflicts with other uses of the term: see generalized logistic distribution.

The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below.

<span class="mw-page-title-main">Half-normal distribution</span> Probability distribution

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

<span class="mw-page-title-main">Skew normal distribution</span> Probability distribution

In probability theory and statistics, the skew normal distribution is a continuous probability distribution that generalises the normal distribution to allow for non-zero skewness.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

<span class="mw-page-title-main">Generalized chi-squared distribution</span>

In probability theory and statistics, the generalized chi-squared distribution is the distribution of a quadratic form of a multinormal variable, or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

<span class="mw-page-title-main">Exponentially modified Gaussian distribution</span> Describes the sum of independent normal and exponential random variables

In probability theory, an exponentially modified Gaussian distribution describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y, where X and Y are independent, X is Gaussian with mean μ and variance σ2, and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

References

  1. 1 2 3 Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distributions, Volume 1. John Wiley & Sons. p. 173. ISBN   978-0-471-58495-7.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. Gibbons, J.F.; Mylroie, S. (1973). "Estimation of impurity profiles in ion-implanted amorphous targets using joined half-Gaussian distributions". Applied Physics Letters. 22 (11): 568–569. Bibcode:1973ApPhL..22..568G. doi:10.1063/1.1654511.
  3. 1 2 3 4 John, S. (1982). "The three-parameter two-piece normal family of distributions and its fitting". Communications in Statistics - Theory and Methods. 11 (8): 879–885. doi:10.1080/03610928208828279.
  4. Fechner, G.T. (ed. Lipps, G.F.) (1897). Kollectivmasslehre. Engelmann, Leipzig.
  5. Wallis, K.F. (2014). The two-piece normal, binormal, or double Gaussian distribution: its origin and rediscoveries. Statistical Science, vol. 29, no. 1, pp.106-112. doi:10.1214/13-STS417.
  6. de Roon, F. and Karehnke, P. (2016). A simple skewed distribution with asset pricing applications. Review of Finance , 2016, 1-29.
  7. 1 2 Juan Manuel Julio (2007). The Fan Chart: The Technical Details Of The New Implementation. Banco de la República. Retrieved 2010-09-11, direct link {{cite conference}}: External link in |postscript= (help)CS1 maint: postscript (link)
  8. 1 2 Britton, E.; P. Fisher; Whitley, J. (1998). "The inflation report projections: understanding the fan chart". Quarterly Bulletin. February 1998: 30–37.
  9. Banerjee, N.; A. Das (2011). Fan Chart: Methodology and its Application to Inflation Forecasting in India. Reserve Bank of India Working Paper Series.
  10. 1 2 Villani, Mattias; Rolf Larsson (2006). "The Multivariate Split Normal Distribution and Asymmetric Principal Components Analysis". Communications in Statistics - Theory and Methods. 35 (6): 1123–1140. CiteSeerX   10.1.1.533.4095 . doi:10.1080/03610920600672252. ISSN   0361-0926. S2CID   124959166.
  11. Bank of England, Inflation Report Archived 2010-08-13 at the Wayback Machine