Truncated normal distribution

	Probability density function Probability density function for the truncated normal distribution for different sets of parameters. In all cases, a = −10 and b = 10. For the black: μ = −8, σ = 2; blue: μ = 0, σ = 2; red: μ = 9, σ = 10; orange: μ = 0, σ = 10.
	Cumulative distribution function Cumulative distribution function for the truncated normal distribution for different sets of parameters. In all cases, a = −10 and b = 10. For the black: μ = −8, σ = 2; blue: μ = 0, σ = 2; red: μ = 9, σ = 10; orange: μ = 0, σ = 10.
Notation	;
Parameters	; (but see definition); — minimum value of ; — maximum value of ()
Support
PDF
CDF
Mean
Median
Mode
Variance
Entropy
MGF

Last updated January 09, 2025

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics.

Definitions

Suppose $X$ has a normal distribution with mean $\mu$ and variance $\sigma ^{2}$ and lies within the interval $(a,b),{\text{with}}\;-\infty \leq a<b\leq \infty$ . Then $X$ conditional on $a<X<b$ has a truncated normal distribution.

Its probability density function, $f$ , for $a\leq x\leq b$ , is given by

$f(x;\mu ,\sigma ,a,b)={\frac {1}{\sigma }}\,{\frac {\varphi ({\frac {x-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}$

and by $f=0$ otherwise.

Here, $\varphi (\xi )={\frac {1}{\sqrt {2\pi }}}\exp \left(-{\frac {1}{2}}\xi ^{2}\right)$ is the probability density function of the standard normal distribution and $\Phi (\cdot )$ is its cumulative distribution function $\Phi (x)={\frac {1}{2}}\left(1+\operatorname {erf} (x/{\sqrt {2}})\right).$ By definition, if $b=\infty$ , then $\Phi \left({\tfrac {b-\mu }{\sigma }}\right)=1$ , and similarly, if $a=-\infty$ , then $\Phi \left({\tfrac {a-\mu }{\sigma }}\right)=0$ .

The above formulae show that when $-\infty <a<b<+\infty$ the scale parameter $\sigma ^{2}$ of the truncated normal distribution is allowed to assume negative values. The parameter $\sigma$ is in this case imaginary, but the function $f$ is nevertheless real, positive, and normalizable. The scale parameter $\sigma ^{2}$ of the untruncated normal distribution must be positive because the distribution would not be normalizable otherwise. The doubly truncated normal distribution, on the other hand, can in principle have a negative scale parameter (which is different from the variance, see summary formulae), because no such integrability problems arise on a bounded domain. In this case the distribution cannot be interpreted as an untruncated normal conditional on $a<X<b$ , of course, but can still be interpreted as a maximum-entropy distribution with first and second moments as constraints, and has an additional peculiar feature: it presents two local maxima instead of one, located at $x=a$ and $x=b$ .

Properties

The truncated normal is one of two possible maximum entropy probability distributions for a fixed mean and variance constrained to the interval [a,b], the other being the truncated U.^[2] Truncated normals with fixed support form an exponential family. Nielsen^[3] reported closed-form formula for calculating the Kullback-Leibler divergence and the Bhattacharyya distance between two truncated normal distributions with the support of the first distribution nested into the support of the second distribution.

Moments

If the random variable has been truncated only from below, some probability mass has been shifted to higher values, giving a first-order stochastically dominating distribution and hence increasing the mean to a value higher than the mean $\mu$ of the original normal distribution. Likewise, if the random variable has been truncated only from above, the truncated distribution has a mean less than $\mu .$

Regardless of whether the random variable is bounded above, below, or both, the truncation is a mean-preserving contraction combined with a mean-changing rigid shift, and hence the variance of the truncated distribution is less than the variance $\sigma ^{2}$ of the original normal distribution.

Two sided truncation^[4]

Let $\alpha =(a-\mu )/\sigma$ and $\beta =(b-\mu )/\sigma$ . Then: $\operatorname {E} (X\mid a<X<b)=\mu -\sigma {\frac {\varphi (\beta )-\varphi (\alpha )}{\Phi (\beta )-\Phi (\alpha )}}$ and $\operatorname {Var} (X\mid a<X<b)=\sigma ^{2}\left[1-{\frac {\beta \varphi (\beta )-\alpha \varphi (\alpha )}{\Phi (\beta )-\Phi (\alpha )}}-\left({\frac {\varphi (\beta )-\varphi (\alpha )}{\Phi (\beta )-\Phi (\alpha )}}\right)^{2}\right]$

Care must be taken in the numerical evaluation of these formulas, which can result in catastrophic cancellation when the interval $[a,b]$ does not include $\mu$ . There are better ways to rewrite them that avoid this issue.^[5]

One sided truncation (of lower tail)^[6]

In this case $\;b=\infty ,\;\varphi (\beta )=0,\;\Phi (\beta )=1,$ then

$\operatorname {E} (X\mid X>a)=\mu +\sigma \varphi (\alpha )/Z,\!$

and

$\operatorname {Var} (X\mid X>a)=\sigma ^{2}[1+\alpha \varphi (\alpha )/Z-(\varphi (\alpha )/Z)^{2}],$

where $Z=1-\Phi (\alpha ).$

One sided truncation (of upper tail)

In this case $\;a=\alpha =-\infty ,\;\varphi (\alpha )=0,\;\Phi (\alpha )=0,$ then

$\operatorname {E} (X\mid X<b)=\mu -\sigma {\frac {\varphi (\beta )}{\Phi (\beta )}},$ $\operatorname {Var} (X\mid X<b)=\sigma ^{2}\left[1-\beta {\frac {\varphi (\beta )}{\Phi (\beta )}}-\left({\frac {\varphi (\beta )}{\Phi (\beta )}}\right)^{2}\right].$

Barr & Sherrill (1999) give a simpler expression for the variance of one sided truncations. Their formula is in terms of the chi-square CDF, which is implemented in standard software libraries. Bebu & Mathew (2009) provide formulas for (generalized) confidence intervals around the truncated moments.

A recursive formula

As for the non-truncated case, there is a recursive formula for the truncated moments.^[7]

Multivariate

Computing the moments of a multivariate truncated normal is harder.

Generating values from the truncated normal distribution

A random variate $x$ defined as $x=\Phi ^{-1}(\Phi (\alpha )+U\cdot (\Phi (\beta )-\Phi (\alpha )))\sigma +\mu$ with $\Phi$ the cumulative distribution function and $\Phi ^{-1}$ its inverse, $U$ a uniform random number on $(0,1)$ , follows the distribution truncated to the range $(a,b)$ . This is simply the inverse transform method for simulating random variables. Although one of the simplest, this method can either fail when sampling in the tail of the normal distribution,^[8] or be much too slow.^[9] Thus, in practice, one has to find alternative methods of simulation.

One such truncated normal generator (implemented in Matlab and in R (programming language) as trandn.R ) is based on an acceptance rejection idea due to Marsaglia.^[10] Despite the slightly suboptimal acceptance rate of Marsaglia (1964) in comparison with Robert (1995), Marsaglia's method is typically faster,^[9] because it does not require the costly numerical evaluation of the exponential function.

For more on simulating a draw from the truncated normal distribution, see Robert (1995), Lynch (2007 , Section 8.1.3 (pages 200–206)), Devroye (1986). The MSM package in R has a function, rtnorm, that calculates draws from a truncated normal. The truncnorm package in R also has functions to draw from a truncated normal.

Chopin (2011) proposed (arXiv) an algorithm inspired from the Ziggurat algorithm of Marsaglia and Tsang (1984, 2000), which is usually considered as the fastest Gaussian sampler, and is also very close to Ahrens's algorithm (1995). Implementations can be found in C, C++, Matlab and Python.

Sampling from the multivariate truncated normal distribution is considerably more difficult.^[11] Exact or perfect simulation is only feasible in the case of truncation of the normal distribution to a polytope region.^[11]^[12] In more general cases, Damien & Walker (2001) introduce a general methodology for sampling truncated densities within a Gibbs sampling framework. Their algorithm introduces one latent variable and, within a Gibbs sampling framework, it is more computationally efficient than the algorithm of Robert (1995).

Notes

↑ "Lecture 4: Selection" (PDF). web.ist.utl.pt. Instituto Superior Técnico. November 11, 2002. p. 1. Retrieved 14 July 2015.
↑ Dowson, D.; Wragg, A. (September 1973). "Maximum-entropy distributions having prescribed first and second moments (Corresp.)". IEEE Transactions on Information Theory. 19 (5): 689–693. doi:10.1109/TIT.1973.1055060. ISSN 1557-9654.
↑ Frank Nielsen (2022). "Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences". Entropy. 24 (3). MDPI: 421. Bibcode:2022Entrp..24..421N. doi: 10.3390/e24030421 . PMC 8947456 . PMID 35327931.
↑ Johnson, Norman Lloyd; Kotz, Samuel; Balakrishnan, N. (1994). Continuous Univariate Distributions. Vol. 1 (2nd ed.). New York: Wiley. Section 10.1. ISBN 0-471-58495-9. OCLC 29428092.
↑ Fernandez-de-Cossio-Diaz, Jorge (2017-12-06), TruncatedNormal.jl: Compute mean and variance of the univariate truncated normal distribution (works far from the peak) , retrieved 2017-12-06
↑ Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 978-0-13-066189-0.
↑ Document by Eric Orjebin, "https://people.smp.uq.edu.au/YoniNazarathy/teaching_projects/studentWork/EricOrjebin_TruncatedNormalMoments.pdf"
↑ Kroese, D. P.; Taimre, T.; Botev, Z. I. (2011). Handbook of Monte Carlo methods. John Wiley & Sons.
1 2 Botev, Z. I.; L'Ecuyer, P. (2017). "Simulation from the Normal Distribution Truncated to an Interval in the Tail". 10th EAI International Conference on Performance Evaluation Methodologies and Tools. 25th–28th Oct 2016 Taormina, Italy: ACM. pp. 23–29. doi:10.4108/eai.25-10-2016.2266879. ISBN 978-1-63190-141-6.{{cite conference}}: CS1 maint: location (link)
↑ Marsaglia, George (1964). "Generating a variable from the tail of the normal distribution". Technometrics. 6 (1): 101–102. doi:10.2307/1266749. JSTOR 1266749.
1 2 Botev, Z. I. (2016). "The normal law under linear restrictions: simulation and estimation via minimax tilting". Journal of the Royal Statistical Society, Series B. 79: 125–148. arXiv: 1603.04166 . doi:10.1111/rssb.12162. S2CID 88515228.
↑ Botev, Zdravko & L'Ecuyer, Pierre (2018). "Chapter 8: Simulation from the Tail of the Univariate and Multivariate Normal Distribution". In Puliafito, Antonio (ed.). Systems Modeling: Methodologies and Tools. EAI/Springer Innovations in Communication and Computing. Springer, Cham. pp. 115–132. doi:10.1007/978-3-319-92378-9_8. ISBN 978-3-319-92377-2. S2CID 125554530.
↑ Sun, Jingchao; Kong, Maiying; Pal, Subhadip (22 June 2021). "The Modified-Half-Normal distribution: Properties and an efficient sampling scheme". Communications in Statistics - Theory and Methods. 52 (5): 1591–1613. doi:10.1080/03610926.2021.1934700. ISSN 0361-0926. S2CID 237919587.

Related Research Articles

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable $X$ is log-normally distributed, then $Y = ln(X)$ has a normal distribution. Equivalently, if $Y$ has a normal distribution, then the exponential function of $Y$ , $X = exp(Y)$ , has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ², the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is called "folded" because probability mass to the left of x = 0 is folded over by taking the absolute value. In the physics of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the half space; it corresponds to having a perfect insulator on a hyperplane through the origin.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used extensively in reliability applications to model failure times. There are several alternative formulations of this distribution in the literature. It is named after Z. W. Birnbaum and S. C. Saunders.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

Financial models with long-tailed distributions and volatility clustering have been introduced to overcome problems with the realism of classical financial models. These classical models of financial time series typically assume homoskedasticity and normality and as such cannot explain stylized phenomena such as skewness, heavy tails, and volatility clustering of the empirical asset returns in finance. In 1963, Benoit Mandelbrot first used the stable distribution to model the empirical distributions which have the skewness and heavy-tail property. Since $-stable distributions have infinite -th moments for all, the tempered stable processes have been proposed for overcoming this limitation of the stable distribution.$

In probability theory, the Mills ratio of a continuous random variable $is the function$

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product $is a product distribution .$

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.$

The GHK algorithm is an importance sampling method for simulating choice probabilities in the multivariate probit model. These simulated probabilities can be used to recover parameter estimates from the maximized likelihood equation using any one of the usual well known maximization methods. Train has well documented steps for implementing this algorithm for a multinomial probit model. What follows here will apply to the binary multivariate probit model.

References

Botev, Zdravko & L'Ecuyer, Pierre (2018). "Chapter 8: Simulation from the Tail of the Univariate and Multivariate Normal Distribution". In Puliafito, Antonio (ed.). Systems Modeling: Methodologies and Tools. EAI/Springer Innovations in Communication and Computing. Springer, Cham. pp. 115–132. doi:10.1007/978-3-319-92378-9_8. ISBN 978-3-319-92377-2. S2CID 125554530.
Devroye, Luc (1986). Non-Uniform Random Variate Generation (PDF). New York: Springer-Verlag. Archived from the original (PDF) on 2014-08-18. Retrieved 2012-04-12.
Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 978-0-13-066189-0.
Norman L. Johnson and Samuel Kotz (1970). Continuous univariate distributions-1, chapter 13. John Wiley & Sons.
Lynch, Scott (2007). Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. New York: Springer. ISBN 978-1-4419-2434-6.
Robert, Christian P. (1995). "Simulation of truncated normal variables". Statistics and Computing. 5 (2): 121–125. arXiv: 0907.4010 . doi:10.1007/BF00143942. S2CID 15943491.
Barr, Donald R.; Sherrill, E.Todd (1999). "Mean and variance of truncated normal distributions". The American Statistician. 53 (4): 357–361. doi:10.1080/00031305.1999.10474490.
Bebu, Ionut; Mathew, Thomas (2009). "Confidence intervals for limited moments and truncated moments in normal and lognormal models". Statistics and Probability Letters. 79 (3): 375–380. doi:10.1016/j.spl.2008.09.006.
Damien, Paul; Walker, Stephen G. (2001). "Sampling truncated normal, beta, and gamma densities". Journal of Computational and Graphical Statistics. 10 (2): 206–215. doi:10.1198/10618600152627906. S2CID 123156320.
Chopin, Nicolas (2011-04-01). "Fast simulation of truncated Gaussian distributions". Statistics and Computing. 21 (2): 275–288. arXiv: 1201.6140 . doi:10.1007/s11222-009-9168-1. ISSN 1573-1375.
Burkardt, John. "The Truncated Normal Distribution" (PDF). Department of Scientific Computing website. Florida State University. Retrieved 15 February 2018.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ist-lecture-4-1] "Lecture 4: Selection" (PDF). web.ist.utl.pt. Instituto Superior Técnico. November 11, 2002. p. 1. Retrieved 14 July 2015.

[2] Dowson, D.; Wragg, A. (September 1973). "Maximum-entropy distributions having prescribed first and second moments (Corresp.)". IEEE Transactions on Information Theory. 19 (5): 689–693. doi:10.1109/TIT.1973.1055060. ISSN 1557-9654.

[3] Frank Nielsen (2022). "Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences". Entropy. 24 (3). MDPI: 421. Bibcode:2022Entrp..24..421N. doi: 10.3390/e24030421 . PMC 8947456 . PMID 35327931.

[4] Johnson, Norman Lloyd; Kotz, Samuel; Balakrishnan, N. (1994). Continuous Univariate Distributions. Vol. 1 (2nd ed.). New York: Wiley. Section 10.1. ISBN 0-471-58495-9. OCLC 29428092.

[:0-5] Fernandez-de-Cossio-Diaz, Jorge (2017-12-06), TruncatedNormal.jl: Compute mean and variance of the univariate truncated normal distribution (works far from the peak) , retrieved 2017-12-06

[6] Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 978-0-13-066189-0.

[7] Document by Eric Orjebin, "https://people.smp.uq.edu.au/YoniNazarathy/teaching_projects/studentWork/EricOrjebin_TruncatedNormalMoments.pdf"

[8] Kroese, D. P.; Taimre, T.; Botev, Z. I. (2011). Handbook of Monte Carlo methods. John Wiley & Sons.

[boLec17-9] 1 2 Botev, Z. I.; L'Ecuyer, P. (2017). "Simulation from the Normal Distribution Truncated to an Interval in the Tail". 10th EAI International Conference on Performance Evaluation Methodologies and Tools. 25th–28th Oct 2016 Taormina, Italy: ACM. pp. 23–29. doi:10.4108/eai.25-10-2016.2266879. ISBN 978-1-63190-141-6.{{cite conference}}: CS1 maint: location (link)

[10] Marsaglia, George (1964). "Generating a variable from the tail of the normal distribution". Technometrics. 6 (1): 101–102. doi:10.2307/1266749. JSTOR 1266749.

[bo16-11] 1 2 Botev, Z. I. (2016). "The normal law under linear restrictions: simulation and estimation via minimax tilting". Journal of the Royal Statistical Society, Series B. 79: 125–148. arXiv: 1603.04166 . doi:10.1111/rssb.12162. S2CID 88515228.

[12] Botev, Zdravko & L'Ecuyer, Pierre (2018). "Chapter 8: Simulation from the Tail of the Univariate and Multivariate Normal Distribution". In Puliafito, Antonio (ed.). Systems Modeling: Methodologies and Tools. EAI/Springer Innovations in Communication and Computing. Springer, Cham. pp. 115–132. doi:10.1007/978-3-319-92378-9_8. ISBN 978-3-319-92377-2. S2CID 125554530.

[Sun,_Kong_and_Pal-13] Sun, Jingchao; Kong, Maiying; Pal, Subhadip (22 June 2021). "The Modified-Half-Normal distribution: Properties and an efficient sampling scheme". Communications in Statistics - Theory and Methods. 52 (5): 1591–1613. doi:10.1080/03610926.2021.1934700. ISSN 0361-0926. S2CID 237919587.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Truncated normal distribution

Contents

Definitions