Inverse transform sampling

Last updated September 09, 2024

Inverse transform sampling (also known as inversion sampling, the inverse probability integral transform, the inverse transformation method, or the Smirnov transform) is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function.

Inverse transformation sampling takes uniform samples of a number $u$ between 0 and 1, interpreted as a probability, and then returns the smallest number $x\in \mathbb {R}$ such that $F(x)\geq u$ for the cumulative distribution function $F$ of a random variable. For example, imagine that $F$ is the standard normal distribution with mean zero and standard deviation one. The table below shows samples taken from the uniform distribution and their representation on the standard normal distribution.

Transformation from uniform sample to normal
$u$	$F^{-1}(u)$
.5	0
.975	1.95996
.995	2.5758
.999999	4.75342
1-2⁻⁵²	8.12589

We are randomly choosing a proportion of the area under the curve and returning the number in the domain such that exactly this proportion of the area occurs to the left of that number. Intuitively, we are unlikely to choose a number in the far end of tails because there is very little area in them which would require choosing a number very close to zero or one.

Computationally, this method involves computing the quantile function of the distribution — in other words, computing the cumulative distribution function (CDF) of the distribution (which maps a number in the domain to a probability between 0 and 1) and then inverting that function. This is the source of the term "inverse" or "inversion" in most of the names for this method. Note that for a discrete distribution, computing the CDF is not in general too difficult: we simply add up the individual probabilities for the various points of the distribution. For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions and other methods are preferred; however, it is a useful method for building more generally applicable samplers such as those based on rejection sampling.

For the normal distribution, the lack of an analytical expression for the corresponding quantile function means that other methods (e.g. the Box–Muller transform) may be preferred computationally. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on:^[1] see, for example, the ziggurat algorithm and rejection sampling. On the other hand, it is possible to approximate the quantile function of the normal distribution extremely accurately using moderate-degree polynomials, and in fact the method of doing this is fast enough that inversion sampling is now the default method for sampling from a normal distribution in the statistical package R.^[2]

Formal statement

For any random variable $X\in \mathbb {R}$ , the random variable $F_{X}^{-1}(U)$ has the same distribution as $X$ , where $F_{X}^{-1}$ is the generalized inverse of the cumulative distribution function $F_{X}$ of $X$ and $U$ is uniform on $[0,1]$ .^[3]

For continuous random variables, the inverse probability integral transform is indeed the inverse of the probability integral transform, which states that for a continuous random variable $X$ with cumulative distribution function $F_{X}$ , the random variable $U=F_{X}(X)$ is uniform on $[0,1]$ .

Graph of the inversion technique from
x
{\displaystyle x}
to
F
(
x
)
{\displaystyle F(x)}
. On the bottom right we see the regular function and in the top left its inversion. InverseFunc.png — Graph of the inversion technique from $x$ to $F(x)$ . On the bottom right we see the regular function and in the top left its inversion.

Intuition

From $U\sim \mathrm {Unif} [0,1]$ , we want to generate $X$ with CDF $F_{X}(x).$ We assume $F_{X}(x)$ to be a continuous, strictly increasing function, which provides good intuition.

We want to see if we can find some strictly monotone transformation $T:[0,1]\mapsto \mathbb {R}$ , such that $T(U){\overset {d}{=}}X$ . We will have

$F_{X}(x)=\Pr(X\leq x)=\Pr(T(U)\leq x)=\Pr(U\leq T^{-1}(x))=T^{-1}(x),{\text{ for }}x\in \mathbb {R} ,$

where the last step used that $\Pr(U\leq y)=y$ when $U$ is uniform on $[0,1]$ .

So we got $F_{X}$ to be the inverse function of $T$ , or, equivalently $T(u)=F_{X}^{-1}(u),u\in [0,1].$

Therefore, we can generate $X$ from $F_{X}^{-1}(U).$

The method

An animation of how inverse transform sampling generates normally distributed random values from uniformly distributed random values Inverse Transform Sampling Example.gif — An animation of how inverse transform sampling generates normally distributed random values from uniformly distributed random values

The problem that the inverse transform sampling method solves is as follows:

Let $X$ be a random variable whose distribution can be described by the cumulative distribution function $F_{X}$ .
We want to generate values of $X$ which are distributed according to this distribution.

The inverse transform sampling method works as follows:

Generate a random number $u$ from the standard uniform distribution in the interval $[0,1]$ , i.e. from $U\sim \mathrm {Unif} [0,1].$
Find the generalized inverse of the desired CDF, i.e. $F_{X}^{-1}(u)$ .
Compute $X'(u)=F_{X}^{-1}(u)$ . The computed random variable $X'(U)$ has distribution $F_{X}$ and thereby the same law as $X$ .

Expressed differently, given a cumulative distribution function $F_{X}$ and a uniform variable $U\in [0,1]$ , the random variable $X=F_{X}^{-1}(U)$ has the distribution $F_{X}$ .^[3]

In the continuous case, a treatment of such inverse functions as objects satisfying differential equations can be given.^[4] Some such differential equations admit explicit power series solutions, despite their non-linearity.^[5]

Examples

As an example, suppose we have a random variable $U\sim \mathrm {Unif} (0,1)$ and a cumulative distribution function

{\begin{aligned}F(x)=1-\exp(-{\sqrt {x}})\end{aligned}}

In order to perform an inversion we want to solve for

F(F^{-1}(u))=u

{\begin{aligned}F(F^{-1}(u))&=u\\1-\exp \left(-{\sqrt {F^{-1}(u)}}\right)&=u\\F^{-1}(u)&=(-\log(1-u))^{2}\\&=(\log(1-u))^{2}\end{aligned}}

From here we would perform steps one, two and three.

As another example, we use the exponential distribution with $F_{X}(x)=1-e^{-\lambda x}$ for x ≥ 0 (and 0 otherwise). By solving y=F(x) we obtain the inverse function

x=F^{-1}(y)=-{\frac {1}{\lambda }}\ln(1-y).

It means that if we draw some

y_{0}

from a

U\sim \mathrm {Unif} (0,1)

and compute

x_{0}=F_{X}^{-1}(y_{0})=-{\frac {1}{\lambda }}\ln(1-y_{0}),

This

x_{0}

has exponential distribution.

The idea is illustrated in the following graph:

Note that the distribution does not change if we start with 1-y instead of y. For computational purposes, it therefore suffices to generate random numbers y in [0, 1] and then simply calculate

x=F^{-1}(y)=-{\frac {1}{\lambda }}\ln(y).

Proof of correctness

Let $F$ be a cumulative distribution function, and let $F^{-1}$ be its generalized inverse function (using the infimum because CDFs are weakly monotonic and right-continuous):^[6]

F^{-1}(u)=\inf \;\{x\mid F(x)\geq u\}\qquad (0<u<1).

Claim: If $U$ is a uniform random variable on $[0,1]$ then $F^{-1}(U)$ has $F$ as its CDF.

Proof:

{\begin{aligned}&\Pr(F^{-1}(U)\leq x)\\&{}=\Pr(U\leq F(x))\quad &(F{\text{ is right-continuous, so }}\{u:F^{-1}(u)\leq x\}=\{u:u\leq F(x)\})\\&{}=F(x)\quad &({\text{because }}\Pr(U\leq u)=u,{\text{ when }}U{\text{ is uniform on }}[0,1])\\\end{aligned}}

Truncated distribution

Inverse transform sampling can be simply extended to cases of truncated distributions on the interval $(a,b]$ without the cost of rejection sampling: the same algorithm can be followed, but instead of generating a random number $u$ uniformly distributed between 0 and 1, generate $u$ uniformly distributed between $F(a)$ and $F(b)$ , and then again take $F^{-1}(u)$ .

Reduction of the number of inversions

In order to obtain a large number of samples, one needs to perform the same number of inversions of the distribution. One possible way to reduce the number of inversions while obtaining a large number of samples is the application of the so-called Stochastic Collocation Monte Carlo sampler (SCMC sampler) within a polynomial chaos expansion framework. This allows us to generate any number of Monte Carlo samples with only a few inversions of the original distribution with independent samples of a variable for which the inversions are analytically available, for example the standard normal variable.^[7]

Software implementations

There are software implementations available for applying the inverse sampling method by using numerical approximations of the inverse in the case that it is not available in closed form. For example, an approximation of the inverse can be computed if the user provides some information about the distributions such as the PDF ^[8] or the CDF.

C library UNU.RAN ^[9]
R library Runuran ^[10]
Python subpackage sampling in scipy.stats ^[11]^[12]

Related Research Articles

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable $, or just distribution function of, evaluated at, is the probability that will take a value less than or equal to .$

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log₄5 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in $with a density.$

In probability theory, the Type-2 Gumbel probability density function is

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, $and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.$

In probability theory and statistics, a copula is a multivariate cumulative distribution function for which the marginal probability distribution of each variable is uniform on the interval [0, 1]. Copulas are used to describe/model the dependence (inter-correlation) between random variables. Their name, introduced by applied mathematician Abe Sklar in 1959, comes from the Latin for "link" or "tie", similar but unrelated to grammatical copulas in linguistics. Copulas have been used widely in quantitative finance to model and minimize tail risk and portfolio-optimization applications.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by $1/ n$ at each of the $n$ data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of ⁠ $⁠$ independent Bernoulli trials, where each trial has probability of success ⁠ $⁠$ . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

In probability theory, the probability integral transform relates to the result that data values that are modeled as being random variables from any given continuous distribution can be converted to random variables having a standard uniform distribution. This holds exactly provided that the distribution being used is the true distribution of the random variables; if the distribution is one fitted to the data, the result will hold approximately in large samples.

In probability and statistics, the quantile function outputs the value of a random variable such that its probability is less than or equal to an input probability value. Intuitively, the quantile function associates with a range at and below a probability input the likelihood that a random variable is realized in that range for some probability distribution. It is also called the percentile function, percent-point function, inverse cumulative distribution function or inverse distribution function.

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution. The first methods were developed for Monte-Carlo simulations in the Manhattan project, published by John von Neumann in the early 1950s.

In probability theory and statistics, an inverse distribution is the distribution of the reciprocal of a random variable. Inverse distributions arise in particular in the Bayesian context of prior distributions and posterior distributions for scale parameters. In the algebra of random variables, inverse distributions are special cases of the class of ratio distributions, in which the numerator random variable has a degenerate distribution.

A copula is a mathematical function that provides a relationship between marginal distributions of random variables and their joint distributions. Copulas are important because it represents a dependence structure without using marginal distributions. Copulas have been widely used in the field of finance, but their use in signal processing is relatively new. Copulas have been employed in the field of wireless communication for classifying radar signals, change detection in remote sensing applications, and EEG signal processing in medicine. In this article, a short introduction to copulas is presented, followed by a mathematical derivation to obtain copula density functions, and then a section with a list of copula density functions with applications in signal processing.

References

↑ Luc Devroye (1986). Non-Uniform Random Variate Generation (PDF). New York: Springer-Verlag. Archived from the original (PDF) on 2014-08-18. Retrieved 2012-04-12.
↑ "R: Random Number Generation".
1 2 McNeil, Alexander J.; Frey, Rüdiger; Embrechts, Paul (2005). Quantitative risk management. Princeton Series in Finance. Princeton University Press, Princeton, NJ. p. 186. ISBN 0-691-12255-5.
↑ Steinbrecher, György; Shaw, William T. (19 March 2008). "Quantile mechanics". European Journal of Applied Mathematics. 19 (2). doi:10.1017/S0956792508007341. S2CID 6899308.
↑ Arridge, Simon; Maass, Peter; Öktem, Ozan; Schönlieb, Carola-Bibiane (2019). "Solving inverse problems using data-driven models". Acta Numerica. 28: 1–174. doi: 10.1017/S0962492919000059 . ISSN 0962-4929. S2CID 197480023.
↑ Luc Devroye (1986). "Section 2.2. Inversion by numerical solution of F(X) = U" (PDF). Non-Uniform Random Variate Generation. New York: Springer-Verlag.
↑ L.A. Grzelak, J.A.S. Witteveen, M. Suarez, and C.W. Oosterlee. The stochastic collocation Monte Carlo sampler: Highly efficient sampling from “expensive” distributions. https://ssrn.com/abstract=2529691
↑ Derflinger, Gerhard; Hörmann, Wolfgang; Leydold, Josef (2010). "Random variate generation by numerical inversion when only the density is known" (PDF). ACM Transactions on Modeling and Computer Simulation. 20 (4). doi:10.1145/945511.945517.
↑ "UNU.RAN - Universal Non-Uniform RANdom number generators".
↑ "Runuran: R Interface to the 'UNU.RAN' Random Variate Generators". 17 January 2023.
↑ "Random Number Generators (Scipy.stats.sampling) — SciPy v1.12.0 Manual".
↑ Baumgarten, Christoph; Patel, Tirth (2022). "Automatic random variate generation in Python". Proceedings of the 21st Python in Science Conference. pp. 46–51. doi:10.25080/majora-212e5952-007.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Luc Devroye (1986). Non-Uniform Random Variate Generation (PDF). New York: Springer-Verlag. Archived from the original (PDF) on 2014-08-18. Retrieved 2012-04-12.

[2] "R: Random Number Generation".

[mcneil2005-3] 1 2 McNeil, Alexander J.; Frey, Rüdiger; Embrechts, Paul (2005). Quantitative risk management. Princeton Series in Finance. Princeton University Press, Princeton, NJ. p. 186. ISBN 0-691-12255-5.

[4] Steinbrecher, György; Shaw, William T. (19 March 2008). "Quantile mechanics". European Journal of Applied Mathematics. 19 (2). doi:10.1017/S0956792508007341. S2CID 6899308.

[5] Arridge, Simon; Maass, Peter; Öktem, Ozan; Schönlieb, Carola-Bibiane (2019). "Solving inverse problems using data-driven models". Acta Numerica. 28: 1–174. doi: 10.1017/S0962492919000059 . ISSN 0962-4929. S2CID 197480023.

[6] Luc Devroye (1986). "Section 2.2. Inversion by numerical solution of F(X) = U" (PDF). Non-Uniform Random Variate Generation. New York: Springer-Verlag.

[7] L.A. Grzelak, J.A.S. Witteveen, M. Suarez, and C.W. Oosterlee. The stochastic collocation Monte Carlo sampler: Highly efficient sampling from “expensive” distributions. https://ssrn.com/abstract=2529691

[8] Derflinger, Gerhard; Hörmann, Wolfgang; Leydold, Josef (2010). "Random variate generation by numerical inversion when only the density is known" (PDF). ACM Transactions on Modeling and Computer Simulation. 20 (4). doi:10.1145/945511.945517.

[9] "UNU.RAN - Universal Non-Uniform RANdom number generators".

[10] "Runuran: R Interface to the 'UNU.RAN' Random Variate Generators". 17 January 2023.

[11] "Random Number Generators (Scipy.stats.sampling) — SciPy v1.12.0 Manual".

[12] Baumgarten, Christoph; Patel, Tirth (2022). "Automatic random variate generation in Python". Proceedings of the 21st Python in Science Conference. pp. 46–51. doi:10.25080/majora-212e5952-007.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]