Pseudorandom number generator

Last updated

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), [1] is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed (which may include truly random values). Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility. [2]

Contents

PRNGs are central in applications such as simulations (e.g. for the Monte Carlo method), electronic games (e.g. for procedural generation), and cryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and more elaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.

Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, joking that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." [3]

Potential issues

In practice, the output from many common PRNGs exhibit artifacts that cause them to fail statistical pattern-detection tests. These include:

Defects exhibited by flawed PRNGs range from unnoticeable (and unknown) to very obvious. An example was the RANDU random number algorithm used for decades on mainframe computers. It was seriously flawed, but its inadequacy went undetected for a very long time.

In many fields, research work prior to the 21st century that relied on random selection or on Monte Carlo simulations, or in other ways relied on PRNGs, were much less reliable than ideal as a result of using poor-quality PRNGs. [4] Even today, caution is sometimes required, as illustrated by the following warning in the International Encyclopedia of Statistical Science (2010). [5]

The list of widely used generators that should be discarded is much longer [than the list of good generators]. Do not trust blindly the software vendors. Check the default RNG of your favorite software and be ready to replace it if needed. This last recommendation has been made over and over again over the past 40 years. Perhaps amazingly, it remains as relevant today as it was 40 years ago.

As an illustration, consider the widely used programming language Java. Up until 2020, Java still relied on a linear congruential generator (LCG) for its PRNG, [6] [7] which is of low quality (see further below). Java support was upgraded with Java 17.

One well-known PRNG to avoid major problems and still run fairly quickly is the Mersenne Twister (discussed below), which was published in 1998. Other higher-quality PRNGs, both in terms of computational and statistical performance, were developed before and after this date; these can be identified in the List of pseudorandom number generators.

Generators based on linear recurrences

In the second half of the 20th century, the standard class of algorithms used for PRNGs comprised linear congruential generators. The quality of LCGs was known to be inadequate, but better methods were unavailable. Press et al. (2007) described the result thus: "If all scientific papers whose results are in doubt because of [LCGs and related] were to disappear from library shelves, there would be a gap on each shelf about as big as your fist." [8]

A major advance in the construction of pseudorandom generators was the introduction of techniques based on linear recurrences on the two-element field; such generators are related to linear-feedback shift registers.

The 1997 invention of the Mersenne Twister, [9] in particular, avoided many of the problems with earlier generators. The Mersenne Twister has a period of 219 937  1 iterations (≈ 4.3×106001), is proven to be equidistributed in (up to) 623 dimensions (for 32-bit values), and at the time of its introduction was running faster than other statistically reasonable generators.

In 2003, George Marsaglia introduced the family of xorshift generators, [10] again based on a linear recurrence. Such generators are extremely fast and, combined with a nonlinear operation, they pass strong statistical tests. [11] [12] [13]

In 2006, the WELL family of generators was developed. [14] The WELL generators in some ways improves on the quality of the Mersenne Twister, which has a too-large state space and a very slow recovery from state spaces with a large number of zeros.

Cryptographic PRNGs

A PRNG suitable for cryptographic applications is called a cryptographically-secure PRNG (CSPRNG). A requirement for a CSPRNG is that an adversary not knowing the seed has only negligible advantage in distinguishing the generator's output sequence from a random sequence. In other words, while a PRNG is only required to pass certain statistical tests, a CSPRNG must pass all statistical tests that are restricted to polynomial time in the size of the seed. Though a proof of this property is beyond the current state of the art of computational complexity theory, strong evidence may be provided by reducing to the CSPRNG from a problem that is assumed to be hard, such as integer factorization. [15] In general, years of review may be required before an algorithm can be certified as a CSPRNG.

Some classes of CSPRNGs include the following:

It has been shown to be likely that the NSA has inserted an asymmetric backdoor into the NIST-certified pseudorandom number generator Dual_EC_DRBG. [19]

Most PRNG algorithms produce sequences that are uniformly distributed by any of several tests. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence. In this setting, the distinguisher knows that either the known PRNG algorithm was used (but not the state with which it was initialized) or a truly random algorithm was used, and has to distinguish between the two. [20] The security of most cryptographic algorithms and protocols using PRNGs is based on the assumption that it is infeasible to distinguish use of a suitable PRNG from use of a truly random sequence. The simplest examples of this dependency are stream ciphers, which (most often) work by exclusive or-ing the plaintext of a message with the output of a PRNG, producing ciphertext. The design of cryptographically adequate PRNGs is extremely difficult because they must meet additional criteria. The size of its period is an important factor in the cryptographic suitability of a PRNG, but not the only one.

BSI evaluation criteria

The German Federal Office for Information Security (German : Bundesamt für Sicherheit in der Informationstechnik, BSI) has established four criteria for quality of deterministic random number generators. [21] They are summarized here:

For cryptographic applications, only generators meeting the K3 or K4 standards are acceptable.

Mathematical definition

Given:

We call a function (where is the set of positive integers) a pseudo-random number generator for given taking values in if and only if:

( denotes the number of elements in the finite set .)

It can be shown that if is a pseudo-random number generator for the uniform distribution on and if is the CDF of some given probability distribution , then is a pseudo-random number generator for , where is the percentile of , i.e. . Intuitively, an arbitrary distribution can be simulated from a simulation of the standard uniform distribution.

Early approaches

An early computer-based PRNG, suggested by John von Neumann in 1946, is known as the middle-square method. The algorithm is as follows: take any number, square it, remove the middle digits of the resulting number as the "random number", then use that number as the seed for the next iteration. For example, squaring the number "1111" yields "1234321", which can be written as "01234321", an 8-digit number being the square of a 4-digit number. This gives "2343" as the "random" number. Repeating this procedure gives "4896" as the next result, and so on. Von Neumann used 10 digit numbers, but the process was the same.

A problem with the "middle square" method is that all sequences eventually repeat themselves, some very quickly, such as "0000". Von Neumann was aware of this, but he found the approach sufficient for his purposes and was worried that mathematical "fixes" would simply hide errors rather than remove them.

Von Neumann judged hardware random number generators unsuitable, for, if they did not record the output generated, they could not later be tested for errors. If they did record their output, they would exhaust the limited computer memories then available, and so the computer's ability to read and write numbers. If the numbers were written to cards, they would take very much longer to write and read. On the ENIAC computer he was using, the "middle square" method generated numbers at a rate some hundred times faster than reading numbers in from punched cards.

The middle-square method has since been supplanted by more elaborate generators.

A recent innovation is to combine the middle square with a Weyl sequence. This method produces high-quality output through a long period (see middle-square method).

Non-uniform generators

Numbers selected from a non-uniform probability distribution can be generated using a uniform distribution PRNG and a function that relates the two distributions.

First, one needs the cumulative distribution function of the target distribution :

Note that . Using a random number c from a uniform distribution as the probability density to "pass by", we get

so that

is a number randomly selected from distribution . This is based on the inverse transform sampling.

For example, the inverse of cumulative Gaussian distribution with an ideal uniform PRNG with range (0, 1) as input would produce a sequence of (positive only) values with a Gaussian distribution; however

Similar considerations apply to generating other non-uniform distributions such as Rayleigh and Poisson.

See also

Related Research Articles

A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process. Simply put, the problem is that many of the sources of randomness available to humans rely on physical processes not readily available to computer programs.

<span class="mw-page-title-main">Linear congruential generator</span> Algorithm for generating pseudo-randomized numbers

A linear congruential generator (LCG) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation.

The Mersenne Twister is a general-purpose pseudorandom number generator (PRNG) developed in 1997 by Makoto Matsumoto and Takuji Nishimura. Its name derives from the fact that its period length is chosen to be a Mersenne prime.

The Yarrow algorithm is a family of cryptographic pseudorandom number generators (CSPRNG) devised by John Kelsey, Bruce Schneier, and Niels Ferguson and published in 1999. The Yarrow algorithm is explicitly unpatented, royalty-free, and open source; no license is required to use it. An improved design from Ferguson and Schneier, Fortuna, is described in their book, Practical Cryptography

<span class="mw-page-title-main">Hardware random number generator</span> Cryptographic device

In computing, a hardware random number generator (HRNG), true random number generator (TRNG), non-deterministic random bit generator (NRBG), or physical random number generator is a device that generates random numbers from a physical process capable of producing entropy, unlike the pseudorandom number generator that utilizes a deterministic algorithm and non-physical nondeterministic random bit generators that do not include hardware dedicated to generation of entropy.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely known as a cryptographic random number generator (CRNG).

The security of cryptographic systems depends on some secret data that is known to authorized persons but unknown and unpredictable to others. To achieve this unpredictability, some randomization is typically employed. Modern cryptographic protocols often require frequent generation of random quantities. Cryptographic attacks that subvert or exploit weaknesses in this process are known as random number generator attacks.

A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and simulation. A good example of a random permutation is the shuffling of a deck of cards: this is ideally a random permutation of the 52 cards.

Inversive congruential generators are a type of nonlinear congruential pseudorandom number generator, which use the modular multiplicative inverse to generate the next number in a sequence. The standard formula for an inversive congruential generator, modulo some prime q is:

In theoretical computer science and cryptography, a pseudorandom generator (PRG) for a class of statistical tests is a deterministic procedure that maps a random seed to a longer pseudorandom string such that no statistical test in the class can distinguish between the output of the generator and the uniform distribution. The random seed itself is typically a short binary string drawn from the uniform distribution.

<span class="mw-page-title-main">Random number generation</span> Producing a sequence that cannot be predicted better than by random chance

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators (HRNGs), wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact pre-determined—these generations can be reproduced simply by knowing the state of the PRNG.

In cryptography, a pseudorandom function family, abbreviated PRF, is a collection of efficiently-computable functions which emulate a random oracle in the following way: no efficient algorithm can distinguish between a function chosen randomly from the PRF family and a random oracle. Pseudorandom functions are vital tools in the construction of cryptographic primitives, especially secure encryption schemes.

A randomness test, in data evaluation, is a test used to analyze the distribution of a set of data to see whether it can be described as random (patternless). In stochastic modeling, as in some computer simulations, the hoped-for randomness of potential input data can be verified, by a formal test for randomness, to show that the data are valid for use in simulation runs. In some cases, data reveals an obvious non-random pattern, as with so-called "runs in the data". If a selected set of data fails the tests, then parameters can be changed or other randomized data can be used which does pass the tests for randomness.

<span class="mw-page-title-main">Fisher–Yates shuffle</span> Algorithm for generating a random permutation of a finite set

The Fisher–Yates shuffle is an algorithm for shuffling a finite sequence. The algorithm takes a list of all the elements of the sequence, and continually determines the next element in the shuffled sequence by randomly drawing an element from the list until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. The modern version of the algorithm takes time proportional to the number of items being shuffled and shuffles them in place.

Dual_EC_DRBG is an algorithm that was presented as a cryptographically secure pseudorandom number generator (CSPRNG) using methods in elliptic curve cryptography. Despite wide public criticism, including the public identification of the possibility that the National Security Agency put a backdoor into a recommended implementation, it was for seven years one of four CSPRNGs standardized in NIST SP 800-90A as originally published circa June 2006, until it was withdrawn in 2014.

A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weak entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent from the source and uniformly distributed. Examples of weakly random sources include radioactive decay or thermal noise; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (TRNG); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

In 1997, Moni Naor and Omer Reingold described efficient constructions for various cryptographic primitives in private key as well as public-key cryptography. Their result is the construction of an efficient pseudorandom function. Let p and l be prime numbers with l |p−1. Select an element g of multiplicative order l. Then for each (n+1)-dimensional vector a = (a0,a1, ..., an)∈ they define the function

<span class="mw-page-title-main">OpenPuff</span>

OpenPuff Steganography and Watermarking, sometimes abbreviated OpenPuff or Puff, is a free steganography tool for Microsoft Windows created by Cosimo Oliboni and still maintained as independent software. The program is notable for being the first steganography tool that:

A counter-based random number generation is a kind of pseudorandom number generator that uses only an integer counter as its internal state. They are generally used for generating pseudorandom numbers for large parallel computations.

References

  1. Barker, Elaine; Barker, William; Burr, William; Polk, William; Smid, Miles (July 2012). "Recommendation for Key Management" (PDF). NIST Special Publication 800-57. NIST. doi:10.6028/NIST.SP.800-57p1r3 . Retrieved 19 August 2013.
  2. "Pseudorandom number generators". Khan Academy. Retrieved 2016-01-11.
  3. Von Neumann, John (1951). "Various techniques used in connection with random digits" (PDF). National Bureau of Standards Applied Mathematics Series. 12: 36–38.
  4. Press et al. (2007), chap.7
  5. L'Ecuyer, Pierre (2010). "Uniform random number generators". In Lovric, Miodrag (ed.). International Encyclopedia of Statistical Science . Springer. p. 1629. ISBN   978-3-642-04897-5.
  6. Random (Java Platform SE 8), Java Platform Standard Edition 8 Documentation.
  7. Random.java at OpenJDK.
  8. Press et al. (2007) §7.1
  9. Matsumoto, Makoto; Nishimura, Takuji (1998). "Mersenne twister: a 623-dimensionally equi-distributed uniform pseudo-random number generator" (PDF). ACM Transactions on Modeling and Computer Simulation. ACM. 8 (1): 3–30. doi:10.1145/272991.272995. S2CID   3332028.
  10. Marsaglia, George (July 2003). "Xorshift RNGs". Journal of Statistical Software . 8 (14). doi: 10.18637/jss.v008.i14 . S2CID   250501391.
  11. S.Vigna. "xorshift*/xorshift+ generators and the PRNG shootout".
  12. Vigna S. (2016), "An experimental exploration of Marsaglia’s xorshift generators", ACM Transactions on Mathematical Software , 42; doi : 10.1145/2845077.
  13. Vigna S. (2017), "Further scramblings of Marsaglia’s xorshift generators", Journal of Computational and Applied Mathematics, 315; doi : 10.1016/j.cam.2016.11.006.
  14. Panneton, François; L'Ecuyer, Pierre; Matsumoto, Makoto (2006). "Improved long-period generators based on linear recurrences modulo 2" (PDF). ACM Transactions on Mathematical Software . 32 (1): 1–16. doi:10.1145/1132973.1132974. S2CID   7368302.
  15. Song Y. Yan (7 December 2007). Cryptanalytic Attacks on RSA. Springer, 2007. p. 73. ISBN   978-0-387-48741-0.
  16. Niels Ferguson, Bruce Schneier, Tadayoshi Kohno (2010). "Cryptography Engineering: Design Principles and Practical Applications, Chapter 9.4: The Generator" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  17. Klaus Pommerening (2016). "IV.4 Perfect Random Generators". Cryptology. uni-mainz.de . Retrieved 2017-11-12.
  18. Pass, Rafael. "Lecture 11: The Goldreich-Levin Theorem" (PDF). COM S 687 Introduction to Cryptography. Retrieved 20 July 2016.
  19. Matthew Green (18 September 2013). "The Many Flaws of Dual_EC_DRBG".
  20. Katz, Jonathan; Yehuda, Lindell (2014). Introduction to modern cryptography. CRC press. p. 70.
  21. 1 2 Schindler, Werner (2 December 1999). "Functionality Classes and Evaluation Methodology for Deterministic Random Number Generators" (PDF). Anwendungshinweise und Interpretationen (AIS). Bundesamt für Sicherheit in der Informationstechnik. pp. 5–11. Retrieved 19 August 2013.
  22. "Security requirements for cryptographic modules". FIPS . NIST. 1994-01-11. p. 4.11.1 Power-Up Tests. Archived from the original on May 27, 2013. Retrieved 19 August 2013.

Bibliography