Inverse transform sampling

Last updated

Inverse transform sampling (also known as inversion sampling, the inverse probability integral transform, the inverse transformation method, Smirnov transform, or the golden rule [1] ) is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function.

Contents

Inverse transformation sampling takes uniform samples of a number between 0 and 1, interpreted as a probability, and then returns the smallest number such that for the cumulative distribution function of a random variable. For example, imagine that is the standard normal distribution with mean zero and standard deviation one. The table below shows samples taken from the uniform distribution and their representation on the standard normal distribution.

Transformation from uniform sample to normal
.50
.9751.95996
.9952.5758
.9999994.75342
1-2−528.12589
Inverse transform sampling for normal distribution Inverse transform sampling.png
Inverse transform sampling for normal distribution

We are randomly choosing a proportion of the area under the curve and returning the number in the domain such that exactly this proportion of the area occurs to the left of that number. Intuitively, we are unlikely to choose a number in the far end of tails because there is very little area in them which would require choosing a number very close to zero or one.

Computationally, this method involves computing the quantile function of the distribution — in other words, computing the cumulative distribution function (CDF) of the distribution (which maps a number in the domain to a probability between 0 and 1) and then inverting that function. This is the source of the term "inverse" or "inversion" in most of the names for this method. Note that for a discrete distribution, computing the CDF is not in general too difficult: we simply add up the individual probabilities for the various points of the distribution. For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions and other methods are preferred; however, it is a useful method for building more generally applicable samplers such as those based on rejection sampling.

For the normal distribution, the lack of an analytical expression for the corresponding quantile function means that other methods (e.g. the Box–Muller transform) may be preferred computationally. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on: [2] see, for example, the ziggurat algorithm and rejection sampling. On the other hand, it is possible to approximate the quantile function of the normal distribution extremely accurately using moderate-degree polynomials, and in fact the method of doing this is fast enough that inversion sampling is now the default method for sampling from a normal distribution in the statistical package R. [3]

Formal statement

For any random variable , the random variable has the same distribution as , where is the generalized inverse of the cumulative distribution function of and is uniform on . [4]

For continuous random variables, the inverse probability integral transform is indeed the inverse of the probability integral transform, which states that for a continuous random variable with cumulative distribution function , the random variable is uniform on .

Graph of the inversion technique from
x
{\displaystyle x}
to
F
(
x
)
{\displaystyle F(x)}
. On the bottom right we see the regular function and in the top left its inversion. InverseFunc.png
Graph of the inversion technique from to . On the bottom right we see the regular function and in the top left its inversion.

Intuition

From , we want to generate with CDF We assume to be a continuous, strictly increasing function, which provides good intuition.

We want to see if we can find some strictly monotone transformation , such that . We will have

where the last step used that when is uniform on .

So we got to be the inverse function of , or, equivalently

Therefore, we can generate from

The method

Schematic of the inverse transform sampling. The inverse function of
y
=
F
X
(
x
)
{\displaystyle y=F_{X}(x)}
can be defined by
F
X
-
1
(
y
)
=
i
n
f
{
x
|
F
X
(
x
)
>=
y
}
{\displaystyle F_{X}^{-1}(y)=\mathrm {inf} \{x|F_{X}(x)\geq y\}}
. Generalized inversion method.svg
Schematic of the inverse transform sampling. The inverse function of can be defined by .
An animation of how inverse transform sampling generates normally distributed random values from uniformly distributed random values Inverse Transform Sampling Example.gif
An animation of how inverse transform sampling generates normally distributed random values from uniformly distributed random values

The problem that the inverse transform sampling method solves is as follows:

The inverse transform sampling method works as follows:

  1. Generate a random number from the standard uniform distribution in the interval , i.e. from
  2. Find the generalized inverse of the desired CDF, i.e. .
  3. Compute . The computed random variable has distribution and thereby the same law as .

Expressed differently, given a cumulative distribution function and a uniform variable , the random variable has the distribution . [4]

In the continuous case, a treatment of such inverse functions as objects satisfying differential equations can be given. [5] Some such differential equations admit explicit power series solutions, despite their non-linearity. [6]

Examples

In order to perform an inversion we want to solve for
From here we would perform steps one, two and three.
It means that if we draw some from a and compute This has exponential distribution.
The idea is illustrated in the following graph:
Random numbers yi are generated from a uniform distribution between 0 and 1, i.e. Y ~ U(0, 1). They are sketched as colored points on the y-axis. Each of the points is mapped according to x=F (y), which is shown with gray arrows for two example points. In this example, we have used an exponential distribution. Hence, for x >= 0, the probability density is
r
X
(
x
)
=
l
e
-
l
x
{\displaystyle \varrho _{X}(x)=\lambda e^{-\lambda \,x}}
and the cumulative distribution function is
F
(
x
)
=
1
-
e
-
l
x
{\displaystyle F(x)=1-e^{-\lambda \,x}}
. Therefore,
x
=
F
-
1
(
y
)
=
-
ln
[?]
(
1
-
y
)
l
{\displaystyle x=F^{-1}(y)=-{\frac {\ln(1-y)}{\lambda }}}
. We can see that using this method, many points end up close to 0 and only few points end up having high x-values - just as it is expected for an exponential distribution. Inverse transformation method for exponential distribution.jpg
Random numbers yi are generated from a uniform distribution between 0 and 1, i.e. Y ~ U(0, 1). They are sketched as colored points on the y-axis. Each of the points is mapped according to x=F (y), which is shown with gray arrows for two example points. In this example, we have used an exponential distribution. Hence, for x ≥ 0, the probability density is and the cumulative distribution function is . Therefore, . We can see that using this method, many points end up close to 0 and only few points end up having high x-values - just as it is expected for an exponential distribution.
Note that the distribution does not change if we start with 1-y instead of y. For computational purposes, it therefore suffices to generate random numbers y in [0, 1] and then simply calculate

Proof of correctness

Let be a cumulative distribution function, and let be its generalized inverse function (using the infimum because CDFs are weakly monotonic and right-continuous): [7]

Claim: If is a uniform random variable on then has as its CDF.

Proof:

Truncated distribution

Inverse transform sampling can be simply extended to cases of truncated distributions on the interval without the cost of rejection sampling: the same algorithm can be followed, but instead of generating a random number uniformly distributed between 0 and 1, generate uniformly distributed between and , and then again take .

Reduction of the number of inversions

In order to obtain a large number of samples, one needs to perform the same number of inversions of the distribution. One possible way to reduce the number of inversions while obtaining a large number of samples is the application of the so-called Stochastic Collocation Monte Carlo sampler (SCMC sampler) within a polynomial chaos expansion framework. This allows us to generate any number of Monte Carlo samples with only a few inversions of the original distribution with independent samples of a variable for which the inversions are analytically available, for example the standard normal variable. [8]

Software implementations

There are software implementations available for applying the inverse sampling method by using numerical approximations of the inverse in the case that it is not available in closed form. For example, an approximation of the inverse can be computed if the user provides some information about the distributions such as the PDF [9] or the CDF.

See also

Related Research Articles

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Cauchy distribution</span> Probability distribution

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

<span class="mw-page-title-main">Order statistic</span> Kth smallest value in a statistical sample

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

<span class="mw-page-title-main">Lévy distribution</span> Probability distribution

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

<span class="mw-page-title-main">Continuous uniform distribution</span> Uniform distribution on an interval

In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.

In probability theory and statistics, a copula is a multivariate cumulative distribution function for which the marginal probability distribution of each variable is uniform on the interval [0, 1]. Copulas are used to describe/model the dependence (inter-correlation) between random variables. Their name, introduced by applied mathematician Abe Sklar in 1959, comes from the Latin for "link" or "tie", similar but unrelated to grammatical copulas in linguistics. Copulas have been used widely in quantitative finance to model and minimize tail risk and portfolio-optimization applications.

<span class="mw-page-title-main">Kumaraswamy distribution</span>

In probability and statistics, the Kumaraswamy's double bounded distribution is a family of continuous probability distributions defined on the interval (0,1). It is similar to the beta distribution, but much simpler to use especially in simulation studies since its probability density function, cumulative distribution function and quantile functions can be expressed in closed form. This distribution was originally proposed by Poondi Kumaraswamy for variables that are lower and upper bounded with a zero-inflation. This was extended to inflations at both extremes [0,1] in later work with S. G. Fletcher.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

<span class="mw-page-title-main">Q–Q plot</span> Plot of the empirical distribution of p-values against the theoretical one

In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other. A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). This defines a parametric curve where the parameter is the index of the quantile interval.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

<span class="mw-page-title-main">Dvoretzky–Kiefer–Wolfowitz inequality</span> Statistical inequality

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

In probability theory, the probability integral transform relates to the result that data values that are modeled as being random variables from any given continuous distribution can be converted to random variables having a standard uniform distribution. This holds exactly provided that the distribution being used is the true distribution of the random variables; if the distribution is one fitted to the data, the result will hold approximately in large samples.

<span class="mw-page-title-main">Quantile function</span> Statistical function that defines the quantiles of a probability distribution

In probability and statistics, the quantile function outputs the value of a random variable such that its probability is less than or equal to an input probability value. Intuitively, the quantile function associates with a range at and below a probability input the likelihood that a random variable is realized in that range for some probability distribution. It is also called the percentile function, percent-point function, inverse cumulative distribution function or inverse distribution function.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.

A copula is a mathematical function that provides a relationship between marginal distributions of random variables and their joint distributions. Copulas are important because it represents a dependence structure without using marginal distributions. Copulas have been widely used in the field of finance, but their use in signal processing is relatively new. Copulas have been employed in the field of wireless communication for classifying radar signals, change detection in remote sensing applications, and EEG signal processing in medicine. In this article, a short introduction to copulas is presented, followed by a mathematical derivation to obtain copula density functions, and then a section with a list of copula density functions with applications in signal processing.

References

  1. Aalto University, N. Hyvönen, Computational methods in inverse problems. Twelfth lecture https://noppa.tkk.fi/noppa/kurssi/mat-1.3626/luennot/Mat-1_3626_lecture12.pdf%5B%5D
  2. Luc Devroye (1986). Non-Uniform Random Variate Generation (PDF). New York: Springer-Verlag. Archived from the original (PDF) on 2014-08-18. Retrieved 2012-04-12.
  3. "R: Random Number Generation".
  4. 1 2 McNeil, Alexander J.; Frey, Rüdiger; Embrechts, Paul (2005). Quantitative risk management. Princeton Series in Finance. Princeton University Press, Princeton, NJ. p. 186. ISBN   0-691-12255-5.
  5. Steinbrecher, György; Shaw, William T. (19 March 2008). "Quantile mechanics". European Journal of Applied Mathematics. 19 (2). doi:10.1017/S0956792508007341. S2CID   6899308.
  6. Arridge, Simon; Maass, Peter; Öktem, Ozan; Schönlieb, Carola-Bibiane (2019). "Solving inverse problems using data-driven models". Acta Numerica. 28: 1–174. doi: 10.1017/S0962492919000059 . ISSN   0962-4929. S2CID   197480023.
  7. Luc Devroye (1986). "Section 2.2. Inversion by numerical solution of F(X) = U" (PDF). Non-Uniform Random Variate Generation. New York: Springer-Verlag.
  8. L.A. Grzelak, J.A.S. Witteveen, M. Suarez, and C.W. Oosterlee. The stochastic collocation Monte Carlo sampler: Highly efficient sampling from “expensive” distributions. https://ssrn.com/abstract=2529691
  9. Derflinger, Gerhard; Hörmann, Wolfgang; Leydold, Josef (2010). "Random variate generation by numerical inversion when only the density is known". ACM Transactions on Modeling and Computer Simulation. 20 (4). doi:10.1145/945511.945517.
  10. https://statmath.wu.ac.at/unuran/index.html
  11. https://cran.r-project.org/package=Runuran
  12. https://docs.scipy.org/doc/scipy/reference/stats.sampling.html
  13. Baumgarten, Christoph; Patel, Tirth (2022). "Automatic random variate generation in Python". Proceedings of the 21st Python in Science Conference. doi:10.25080/majora-212e5952-007.