Ziggurat algorithm

Last updated

The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The algorithm is used to generate values from a monotonically decreasing probability distribution. It can also be applied to symmetric unimodal distributions, such as the normal distribution, by choosing a value from one half of the distribution and then randomly choosing which half the value is considered to have been drawn from. It was developed by George Marsaglia and others in the 1960s.

Contents

A typical value produced by the algorithm only requires the generation of one random floating-point value and one random table index, followed by one table lookup, one multiply operation and one comparison. Sometimes (2.5% of the time, in the case of a normal or exponential distribution when using typical table sizes)[ citation needed ] more computations are required. Nevertheless, the algorithm is computationally much faster[ citation needed ] than the two most commonly used methods of generating normally distributed random numbers, the Marsaglia polar method and the Box–Muller transform, which require at least one logarithm and one square root calculation for each pair of generated values. However, since the ziggurat algorithm is more complex to implement it is best used when large quantities of random numbers are required.

The term ziggurat algorithm dates from Marsaglia's paper with Wai Wan Tsang in 2000; it is so named because it is conceptually based on covering the probability distribution with rectangular segments stacked in decreasing order of size, resulting in a figure that resembles a ziggurat.

The Ziggurat algorithm used to generate sample values with a normal distribution. (Only positive values are shown for simplicity.) The pink dots are initially uniform-distributed random numbers. The desired distribution function is first segmented into equal areas "A". One layer i is selected at random by the uniform source at the left. Then a random value from the top source is multiplied by the width of the chosen layer, and the result is x tested to see which region of the slice it falls into with 3 possible outcomes: 1) (left, solid black region) the sample clearly under the curve and is passed directly to output, 2) (right, vertically striped region) the sample value may lie under the curve, and must be tested further. In that case, a random y value within the chosen layer is generated and compared to f(x). If less, the point is under the curve and the value x is output. If not, (the third case), the chosen point x is rejected and the algorithm is restarted from the beginning. Ziggurat method.gif
The Ziggurat algorithm used to generate sample values with a normal distribution. (Only positive values are shown for simplicity.) The pink dots are initially uniform-distributed random numbers. The desired distribution function is first segmented into equal areas "A". One layer i is selected at random by the uniform source at the left. Then a random value from the top source is multiplied by the width of the chosen layer, and the result is x tested to see which region of the slice it falls into with 3 possible outcomes: 1) (left, solid black region) the sample clearly under the curve and is passed directly to output, 2) (right, vertically striped region) the sample value may lie under the curve, and must be tested further. In that case, a random y value within the chosen layer is generated and compared to f(x). If less, the point is under the curve and the value x is output. If not, (the third case), the chosen point x is rejected and the algorithm is restarted from the beginning.

Theory of operation

The ziggurat algorithm is a rejection sampling algorithm; it randomly generates a point in a distribution slightly larger than the desired distribution, then tests whether the generated point is inside the desired distribution. If not, it tries again. Given a random point underneath a probability density curve, its x coordinate is a random number with the desired distribution.

The distribution the ziggurat algorithm chooses from is made up of n equal-area regions; n  1 rectangles that cover the bulk of the desired distribution, on top of a non-rectangular base that includes the tail of the distribution.

Given a monotone decreasing probability density function f(x), defined for all x ≥ 0, the base of the ziggurat is defined as all points inside the distribution and below y1 = f(x1). This consists of a rectangular region from (0, 0) to (x1, y1), and the (typically infinite) tail of the distribution, where x > x1 (and y < y1).

This layer (call it layer 0) has area A. On top of this, add a rectangular layer of width x1 and height A/x1, so it also has area A. The top of this layer is at height y2 = y1 + A/x1, and intersects the density function at a point (x2, y2), where y2 = f(x2). This layer includes every point in the density function between y1 and y2, but (unlike the base layer) also includes points such as (x1, y2) which are not in the desired distribution.

Further layers are then stacked on top. To use a precomputed table of size n (n = 256 is typical), one chooses x1 such that xn = 0, meaning that the top box, layer n − 1, reaches the distribution's peak at (0, f(0)) exactly.

Layer i extends vertically from yi to yi+1, and can be divided into two regions horizontally: the (generally larger) portion from 0 to xi+1 which is entirely contained within the desired distribution, and the (small) portion from xi+1 to xi, which is only partially contained.

Ignoring for a moment the problem of layer 0, and given uniform random variables U0 and U1 ∈ [0,1), the ziggurat algorithm can be described as:

  1. Choose a random layer 0 ≤ i < n.
  2. Let x = U0xi.
  3. If x<xi+1, return x.
  4. Let y = yi + U1(yi+1yi).
  5. Compute f(x). If y<f(x), return x.
  6. Otherwise, choose new random numbers and go back to step 1.

Step 1 amounts to choosing a low-resolution y coordinate. Step 3 tests if the x coordinate is clearly within the desired density function without knowing more about the y coordinate. If it is not, step 4 chooses a high-resolution y coordinate, and step 5 does the rejection test.

With closely spaced layers, the algorithm terminates at step 3 a very large fraction of the time. For the top layer n − 1, however, this test always fails, because xn = 0.

Layer 0 can also be divided into a central region and an edge, but the edge is an infinite tail. To use the same algorithm to check if the point is in the central region, generate a fictitious x0 = A/y1. This will generate points with x < x1 with the correct frequency, and in the rare case that layer 0 is selected and xx1, use a special fallback algorithm to select a point at random from the tail. Because the fallback algorithm is used less than one time in a thousand, speed is not essential.

Thus, the full ziggurat algorithm for one-sided distributions is:

  1. Choose a random layer 0 ≤ i < n.
  2. Let x = U0xi
  3. If x<xi+1, return x.
  4. If i = 0, generate a point from the tail using the fallback algorithm.
  5. Let y = yi + U1(yi+1yi).
  6. Compute f(x). If y<f(x), return x.
  7. Otherwise, choose new random numbers and go back to step 1.

For a two-sided distribution, the result must be negated 50% of the time. This can often be done conveniently by choosing U0 ∈ (−1,1) and, in step 3, testing if |x| <xi+1.

Fallback algorithms for the tail

Because the ziggurat algorithm only generates most outputs very rapidly, and requires a fallback algorithm whenever x > x1, it is always more complex than a more direct implementation. The specific fallback algorithm depends on the distribution.

For an exponential distribution, the tail looks just like the body of the distribution. One way is to fall back to the most elementary algorithm E = −ln(U1) and let x = x1  ln(U1). Another is to call the ziggurat algorithm recursively and add x1 to the result.

For a normal distribution, Marsaglia suggests a compact algorithm:

  1. Let x = −ln(U1)/x1.
  2. Let y = −ln(U2).
  3. If 2y > x2, return x + x1.
  4. Otherwise, go back to step 1.

Since x1 ≈ 3.5 for typical table sizes, the test in step 3 is almost always successful.

Optimizations

The algorithm can be performed efficiently with precomputed tables of xi and yi = f(xi), but there are some modifications to make it even faster:

Generating the tables

It is possible to store the entire table precomputed, or just include the values n, y1, A, and an implementation of f−1(y) in the source code, and compute the remaining values when initializing the random number generator.

As previously described, you can find xi = f−1(yi) and yi+1 = yi + A/xi. Repeat n  1 times for the layers of the ziggurat. At the end, you should have yn = f(0). There will be some round-off error, but it is a useful sanity test to see that it is acceptably small.

When actually filling in the table values, just assume that xn = 0 and yn = f(0), and accept the slight difference in layer n  1's area as rounding error.

Finding x1 and A

Given an initial (guess at) x1, you need a way to compute the area t of the tail for which x > x1. For the exponential distribution, this is just ex1, while for the normal distribution, assuming you are using the unnormalized f(x) = ex2/2, this is π/2 erfc(x/2). For more awkward distributions, numerical integration may be required.

With this in hand, from x1, you can find y1 = f(x1), the area t in the tail, and the area of the base layer A = x1y1 + t.

Then compute the series yi and xi as above. If yi > f(0) for any i<n, then the initial estimate x1 was too low, leading to too large an area A. If yn<f(0), then the initial estimate x1 was too high.

Given this, use a root-finding algorithm (such as the bisection method) to find the value x1 which produces yn−1 as close to f(0) as possible. Alternatively, look for the value which makes the area of the topmost layer, xn−1(f(0)  yn−1), as close to the desired value A as possible. This saves one evaluation of f−1(x) and is actually the condition of greatest interest.

Related Research Articles

In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.

<span class="mw-page-title-main">Probability density function</span> Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

<span class="mw-page-title-main">Box–Muller transform</span> Statistical transform

The Box–Muller transform, by George Edward Pelham Box and Mervin Edgar Muller, is a random number sampling method for generating pairs of independent, standard, normally distributed random numbers, given a source of uniformly distributed random numbers. The method was in fact first mentioned explicitly by Raymond E. A. C. Paley and Norbert Wiener in 1934.

Bresenham's line algorithm is a line drawing algorithm that determines the points of an n-dimensional raster that should be selected in order to form a close approximation to a straight line between two points. It is commonly used to draw line primitives in a bitmap image, as it uses only integer addition, subtraction, and bit shifting, all of which are very cheap operations in historically common computer architectures. It is an incremental error algorithm, and one of the earliest algorithms developed in the field of computer graphics. An extension to the original algorithm may be used for drawing circles.

In mathematics, a Diophantine equation is an equation of the form P(x1, ..., xj, y1, ..., yk) = 0 (usually abbreviated P(x, y) = 0) where P(x, y) is a polynomial with integer coefficients, where x1, ..., xj indicate parameters and y1, ..., yk indicate unknowns.

<span class="mw-page-title-main">Line drawing algorithm</span> Methods of approximating line segments for pixel displays

In computer graphics, a line drawing algorithm is an algorithm for approximating a line segment on discrete graphical media, such as pixel-based displays and printers. On such media, line drawing requires an approximation. Basic algorithms rasterize lines in one color. A better representation with multiple color gradations requires an advanced process, spatial anti-aliasing.

<span class="mw-page-title-main">Perlin noise</span> Type of gradient noise in computer graphics

Perlin noise is a type of gradient noise developed by Ken Perlin in 1983. It has many uses, including but not limited to: procedurally generating terrain, applying pseudo-random changes to a variable, and assisting in the creation of image textures. It is most commonly implemented in two, three, or four dimensions, but can be defined for any number of dimensions.

In probability theory, coupling is a proof technique that allows one to compare two unrelated random variables (distributions) X and Y by creating a random vector W whose marginal distributions correspond to X and Y respectively. The choice of W is generally not unique, and the whole idea of "coupling" is about making such a choice so that X and Y can be related in a particularly desirable way.

A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and simulation. A good example of a random permutation is the shuffling of a deck of cards: this is ideally a random permutation of the 52 cards.

George Marsaglia was an American mathematician and computer scientist. He is best known for creating the diehard tests, a suite of software for measuring statistical randomness.

<span class="mw-page-title-main">Random number generation</span> Producing a sequence that cannot be predicted better than by random chance

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but unpredictable to foresight. True random number generators can be hardware random-number generators (HRNGs), wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact pre-determined—these generations can be reproduced simply by knowing the state of the PRNG.

Random self-reducibility (RSR) is the rule that a good algorithm for the average case implies a good algorithm for the worst case. RSR is the ability to solve all instances of a problem by solving a large fraction of the instances.

In computer science, multiply-with-carry (MWC) is a method invented by George Marsaglia for generating sequences of random integers based on an initial set from two to many thousands of randomly chosen seed values. The main advantages of the MWC method are that it invokes simple computer integer arithmetic and leads to very fast generation of sequences of random numbers with immense periods, ranging from around to .

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

In computer graphics, a digital differential analyzer (DDA) is hardware or software used for interpolation of variables over an interval between start and end point. DDAs are used for rasterization of lines, triangles and polygons. They can be extended to non linear functions, such as perspective correct texture mapping, quadratic curves, and traversing voxels.

The Lehmer random number generator, sometimes also referred to as the Park–Miller random number generator, is a type of linear congruential generator (LCG) that operates in multiplicative group of integers modulo n. The general formula is

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution. The first methods were developed for Monte-Carlo simulations in the Manhattan project, published by John von Neumann in the early 1950s.

<span class="mw-page-title-main">Alias method</span> Family of algorithms for sampling from discrete probability distributions

In computing, the alias method is a family of efficient algorithms for sampling from a discrete probability distribution, published in 1974 by A. J. Walker. That is, it returns integer values 1 ≤ in according to some arbitrary probability distribution pi. The algorithms typically use O(n log n) or O(n) preprocessing time, after which random values can be drawn from the distribution in O(1) time.

Digital signatures are a means to protect digital information from intentional modification and to authenticate the source of digital information. Public key cryptography provides a rich set of different cryptographic algorithms the create digital signatures. However, the primary public key signatures currently in use will become completely insecure if scientists are ever able to build a moderately sized quantum computer. Post quantum cryptography is a class of cryptographic algorithms designed to be resistant to attack by a quantum cryptography. Several post quantum digital signature algorithms based on hard problems in lattices are being created replace the commonly used RSA and elliptic curve signatures. A subset of these lattice based scheme are based on a problem known as Ring learning with errors. Ring learning with errors based digital signatures are among the post quantum signatures with the smallest public key and signature sizes

References