Ratio of uniforms

Last updated November 27, 2024

The ratio of uniforms is a method initially proposed by Kinderman and Monahan in 1977^[1] for pseudo-random number sampling, that is, for drawing random samples from a statistical distribution. Like rejection sampling and inverse transform sampling, it is an exact simulation method. The basic idea of the method is to use a change of variables to create a bounded set, which can then be sampled uniformly to generate random variables following the original distribution. One feature of this method is that the distribution to sample is only required to be known up to an unknown multiplicative factor, a common situation in computational statistics and statistical physics.

Motivation

A convenient technique to sample a statistical distribution is rejection sampling. When the probability density function of the distribution is bounded and has finite support, one can define a bounding box around it (a uniform proposal distribution), draw uniform samples in the box and return only the x coordinates of the points that fall below the function (see graph). As a direct consequence of the fundamental theorem of simulation,^[2] the returned samples are distributed according to the original distribution.

When the support of the distribution is infinite, it is impossible to draw a rectangular bounding box containing the graph of the function. One can still use rejection sampling, but with a non-uniform proposal distribution. It can be delicate to choose an appropriate proposal distribution,^[3] and one also has to know how to efficiently sample this proposal distribution.

The method of the ratio of uniforms offers a solution to this problem, by essentially using as proposal distribution the distribution created by the ratio of two uniform random variables.

Statement

The statement and the proof are adapted from the presentation by Gobet^[4]

Theorem — Let $X$ be a multidimensional random variable with probability density function $p(x_{1},x_{2},\ldots ,x_{d})$ on $\mathbb {R} ^{d}$ . The function $p$ is only required to be known up to a constant, so we can assume that we only know $f$ where $p(x_{1},x_{2},\ldots ,x_{d})=cf(x_{1},x_{2},\ldots ,x_{d})$ , with $c$ a constant unknown or difficult to compute. Let $r>0$ , a parameter that can be adjusted as we choose to improve the properties of the method. We can define the set $A_{f,r}$ : $A_{f,r}=\left\{(u,v_{1},v_{2},\ldots ,v_{d})\in \mathbb {R} ^{d+1}:0\leq u\leq f\left({\frac {v_{1}}{u^{r}}},{\frac {v_{2}}{u^{r}}},\ldots ,{\frac {v_{d}}{u^{r}}}\right)^{\frac {1}{1+rd}}\right\}$ The Lebesgue measure of the set $A_{f,r}$ is finite and equal to ${\frac {1}{c\,(1+rd)}}$ .

Furthermore, let $(U,V_{1},V_{2},\ldots ,V_{d})$ be a random variable uniformly distributed on the set $A_{f,r}$ . Then, $\left({\frac {V_{1}}{U^{r}}},{\frac {V_{2}}{U^{r}}},\ldots ,{\frac {V_{d}}{U^{r}}}\right)$ is a random variable on $\mathbb {R} ^{d}$ distributed like $X$ .

Proof

We will first assume that the first statement is correct, i.e. $|A_{f,r}|={\frac {1}{c\,(1+rd)}}$ .

Let $\varphi$ be a measurable function on $\mathbb {R} ^{d}$ . Let's consider the expectation of $\varphi \left({\frac {V_{1}}{U^{r}}},\ldots ,{\frac {V_{d}}{U^{r}}}\right)$ on the set $A_{f,r}$ :

E\left[\varphi \left({\frac {V_{1}}{U^{r}}},\ldots ,{\frac {V_{d}}{U^{r}}}\right)\right]={\frac {1}{|A_{f,r}|}}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }\varphi \left({\frac {v_{1}}{u^{r}}},\ldots ,{\frac {v_{d}}{u^{r}}}\right)\mathbf {1} _{(u,v_{1},\ldots ,v_{d})\in A_{f,r}}\mathrm {d} u\,\mathrm {d} v_{1}\ldots \mathrm {d} v_{d}

With the change of variables $x_{i}={\frac {v_{i}}{u^{r}}}$ , we have

{\begin{aligned}E\left[\varphi \left({\frac {V_{1}}{U^{r}}},\ldots ,{\frac {V_{d}}{U^{r}}}\right)\right]&={\frac {1}{|A_{f,r}|}}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }\varphi (x_{1},\ldots ,x_{d})\mathbf {1} _{0\leq u\leq f(x_{1},\ldots ,x_{d})^{\frac {1}{1+rd}}}u^{rd}\mathrm {d} u\,\mathrm {d} x_{1}\cdots \mathrm {d} x_{d}\\&={\frac {1}{|A_{f,r}|}}\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }\varphi \left(x_{1},\ldots ,x_{d}\right){\frac {1}{1+rd}}f\left(x_{1},\ldots ,x_{d}\right)\mathrm {d} x_{1}\cdots \mathrm {d} x_{d}\\&=\int _{-\infty }^{\infty }\ldots \int _{-\infty }^{\infty }\varphi \left(x_{1},\ldots ,x_{d}\right)p\left(x_{1},\ldots ,x_{d}\right)\mathrm {d} x_{1}\cdots \mathrm {d} x_{d}\end{aligned}}

where we can see that $\left({\frac {V_{1}}{U^{r}}},\ldots ,{\frac {V_{d}}{U^{r}}}\right)$ has indeed the density $p$ .

Coming back to the first statement, a similar argument shows that $|A_{f,r}|={\frac {1}{c\,(1+rd)}}$ .

Complements

Rejection sampling in $A_{f,r}$

The above statement does not specify how one should perform the uniform sampling in $A_{f,r}$ . However, the interest of this method is that under mild conditions on $f$ (namely that $f(x_{1},x_{2},\ldots ,x_{d})^{\frac {1}{1+rd}}$ and $x_{i}f(x_{1},x_{2},\ldots ,x_{d})^{\frac {r}{1+rd}}$ for all $i$ are bounded), $A_{f,r}$ is bounded. One can define the rectangular bounding box ${\tilde {A}}_{f,r}$ such that $A_{f,r}\subset {\tilde {A}}_{f,r}=\left[0,\sup _{x_{1},\ldots ,x_{d}}{f(x_{1},\ldots ,x_{d})^{\frac {1}{1+rd}}}\right]\times \prod _{i}\left[\inf _{x_{1},\ldots ,x_{d}}{x_{i}f(x_{1},\ldots ,x_{d})^{\frac {r}{1+rd}}},\sup _{x_{1},\ldots ,x_{d}}{x_{i}f(x_{1},\ldots ,x_{d})^{\frac {r}{1+rd}}}\right]$ This allows to sample uniformly the set $A_{f,r}$ by rejection sampling inside ${\tilde {A}}_{f,r}$ . The parameter $r$ can be adjusted to change the shape of $A_{f,r}$ and maximize the acceptance ratio of this sampling.

Parametric description of the boundary of $A_{f,r}$

The definition of $A_{f,r}$ is already convenient for the rejection sampling step. For illustration purposes, it can be interesting to draw the set, in which case it can be useful to know the parametric description of its boundary: $u=f\left(x_{1},x_{2},\ldots ,x_{d}\right)^{\frac {1}{1+rd}}\quad {\text{and}}\quad \forall i\in [|1,n|],v_{i}=x_{i}u^{r}$ or for the common case where $X$ is a 1-dimensional variable, $(u,v)=\left(f(x)^{\frac {1}{1+r}},x\,f(x)^{\frac {r}{1+r}}\right)$ .

Generalized ratio of uniforms

Above parameterized only with $r$ , the ratio of uniforms can be described with a more general class of transformations in terms of a transformation g.^[5] In the 1-dimensional case, if $g:\mathbb {R} ^{+}\rightarrow \mathbb {R} ^{+}$ is a strictly increasing and differentiable function such that $g(0)=0$ , then we can define $A_{f,g}$ such that

A_{f,g}=\left\{(u,v)\in \mathbb {R} ^{2}:0\leq u\leq g^{-1}\left[f\left({\frac {v}{g'(u)}}\right)\right]\right\}

If $(U,V)$ is a random variable uniformly distributed in $A_{f,g}$ , then ${\frac {V}{g'(U)}}$ is distributed with the density $p$ .

Examples

The exponential distribution

Assume that we want to sample the exponential distribution, $p(x)=\lambda \mathrm {e} ^{-\lambda x}$ with the ratio of uniforms method. We will take here $r=1$ .

We can start constructing the set $A_{f,1}$ :

A_{f,1}=\left\{(u,v)\in \mathbb {R} ^{2}:0\leq u\leq {\sqrt {p\left({\frac {v}{u}}\right)}}\right\}

The condition $0\leq u\leq {\sqrt {p\left({\frac {v}{u}}\right)}}$ is equivalent, after computation, to $0\leq v\leq -{\frac {u}{\lambda }}\ln {\frac {u^{2}}{\lambda }}$ , which allows us to plot the shape of the set (see graph).

This inequality also allows us to determine the rectangular bounding box ${\tilde {A}}_{f,1}$ where $A_{f,1}$ is included. Indeed, with $g(u)=-{\frac {u}{\lambda }}\ln {\frac {u^{2}}{\lambda }}$ , we have $g\left({\sqrt {\lambda }}\right)=0$ and $g'\left({\frac {\sqrt {\lambda }}{\mathrm {e} }}\right)=0$ , from where ${\tilde {A}}_{f,1}=\left[0,{\sqrt {\lambda }}\right]\times \left[0,{\frac {2}{\mathrm {e} {\sqrt {\lambda }}}}\right]$ .

From here, we can draw pairs of uniform random variables $U\sim \mathrm {Unif} \left(0,{\sqrt {\lambda }}\right)$ and $V\sim \mathrm {Unif} \left(0,{\frac {2}{\mathrm {e} {\sqrt {\lambda }}}}\right)$ until $u\leq {\sqrt {\lambda \,\mathrm {e} ^{-\lambda {\frac {v}{u}}}}}$ , and when that happens, we return ${\frac {v}{u}}$ , which is exponentially distributed.

A mixture of normal distributions

Consider the mixture of two normal distributions ${\mathcal {D}}=0.6\,N(\mu =-1,\sigma =2)+0.4\,N(\mu =3,\sigma =1)$ . To apply the method of the ratio of uniforms, with a given $r$ , one should first determine the boundaries of the rectangular bounding box ${\tilde {A}}_{f,r}$ enclosing the set $A_{f,r}$ . This can be done numerically, by computing the minimum and maximum of $u(x)=f(x)^{\frac {1}{1+r}}$ and $v(x)=x\,f(x)^{\frac {r}{1+r}}$ on a grid of values of $x$ . Then, one can draw uniform samples $(u,v)\in {\tilde {A}}_{f,r}$ , only keep those that fall inside the set $A_{f,r}$ and return them as ${\frac {v}{u^{r}}}$ .

It is possible to optimize the acceptance ratio by adjusting the value of $r$ , as seen on the graphs.

Software

The rust^[6] and Runuran^[7] contributed packages in R.

Related Research Articles

In mathematics, the $L p$ spaces are function spaces defined using a natural generalization of the $p$ -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation.

In mathematics, the spectral radius of a square matrix is the maximum of the absolute values of its eigenvalues. More generally, the spectral radius of a bounded linear operator is the supremum of the absolute values of the elements of its spectrum. The spectral radius is often denoted by $ρ(\cdot)$ .

In probability theory, the Borel–Kolmogorov paradox is a paradox relating to conditional probability with respect to an event of probability zero. It is named after Émile Borel and Andrey Kolmogorov.

Stellar dynamics is the branch of astrophysics which describes in a statistical way the collective motions of stars subject to their mutual gravity. The essential difference from celestial mechanics is that the number of body

In probability theory, the Gram–Charlier A series, and the Edgeworth series are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms differ. The key idea of these expansions is to write the characteristic function of the distribution whose probability density function $f$ is to be approximated in terms of the characteristic function of a distribution with known and suitable properties, and to recover $f$ through the inverse Fourier transform.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all of its entries are sampled randomly from a probability distribution. Random matrix theory (RMT) is the study of properties of random matrices, often as they become large. RMT provides techniques like mean-field theory, diagrammatic methods, the cavity method, or the replica method to compute quantities like traces, spectral densities, or scalar products between eigenvectors. Many physical phenomena, such as the spectrum of nuclei of heavy atoms, the thermal conductivity of a lattice, or the emergence of quantum chaos, can be modeled mathematically as problems concerning large, random matrices.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In mathematics, Watson's lemma, proved by G. N. Watson, has significant application within the theory on the asymptotic behavior of integrals.

In mathematics, a determinantal point process is a stochastic point process, the probability distribution of which is characterized as a determinant of some function. They are suited for modelling global negative correlations, and for efficient algorithms of sampling, marginalization, conditioning, and other inference tasks. Such processes arise as important tools in random matrix theory, combinatorics, physics, machine learning, and wireless network modeling.

In probability theory, more specifically the study of random matrices, the circular law concerns the distribution of eigenvalues of an $n \times n$ random matrix with independent and identically distributed entries in the limit $n \to \infty$ .

<span class="mw-page-title-main">Wrapped exponential distribution</span> Probability distribution

In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.

References

↑ Kinderman, A. J.; Monahan, J. F. (September 1977). "Computer Generation of Random Variables Using the Ratio of Uniform Deviates". ACM Transactions on Mathematical Software. 3 (3): 257–260. doi: 10.1145/355744.355750 . S2CID 12884505.
↑ Robert, Christian; Casella, George (2004). Monte Carlo Statistical Methods (2 ed.). Springer-Verlag. p. 47. ISBN 978-0-387-21239-5.
↑ Martino, Luca; Luengo, David; Míguez, Joaquín (16 July 2013). "On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling". p. 13. arXiv: 1205.0482 [stat.CO].
↑ GOBET, EMMANUEL (2020). MONTE-CARLO METHODS AND STOCHASTIC PROCESSES : from linear to non-linear. [S.l.]: CRC PRESS. ISBN 978-0-367-65846-5. OCLC 1178639517.
↑ Wakefield, J. C.; Gelfand, A. E.; Smith, A. F. M. (1 December 1991). "Efficient generation of random variates via the ratio-of-uniforms method". Statistics and Computing. 1 (2): 129–133. doi:10.1007/BF01889987. ISSN 1573-1375. S2CID 119824513.
↑ Northrop, P. J. (2021), rust: Ratio-of-Uniforms Simulation with Transformation
↑ Leydold, J.; Hörmann, W. (2021), Runuran: R Interface to the 'UNU.RAN' Random Variate Generators

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Kinderman, A. J.; Monahan, J. F. (September 1977). "Computer Generation of Random Variables Using the Ratio of Uniform Deviates". ACM Transactions on Mathematical Software. 3 (3): 257–260. doi: 10.1145/355744.355750 . S2CID 12884505.

[Robert2004MonteCarlo-2] Robert, Christian; Casella, George (2004). Monte Carlo Statistical Methods (2 ed.). Springer-Verlag. p. 47. ISBN 978-0-387-21239-5.

[Martino2013Generalized-3] Martino, Luca; Luengo, David; Míguez, Joaquín (16 July 2013). "On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling". p. 13. arXiv: 1205.0482 [stat.CO].

[4] GOBET, EMMANUEL (2020). MONTE-CARLO METHODS AND STOCHASTIC PROCESSES : from linear to non-linear. [S.l.]: CRC PRESS. ISBN 978-0-367-65846-5. OCLC 1178639517.

[Wakefield1991Efficient-5] Wakefield, J. C.; Gelfand, A. E.; Smith, A. F. M. (1 December 1991). "Efficient generation of random variates via the ratio-of-uniforms method". Statistics and Computing. 1 (2): 129–133. doi:10.1007/BF01889987. ISSN 1573-1375. S2CID 119824513.

[6] Northrop, P. J. (2021), rust: Ratio-of-Uniforms Simulation with Transformation

[7] Leydold, J.; Hörmann, W. (2021), Runuran: R Interface to the 'UNU.RAN' Random Variate Generators

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Ratio of uniforms

Contents

Motivation

Statement