Randomness test

Last updated

A randomness test (or test for randomness), in data evaluation, is a test used to analyze the distribution of a set of data to see whether it can be described as random (patternless). In stochastic modeling, as in some computer simulations, the hoped-for randomness of potential input data can be verified, by a formal test for randomness, to show that the data are valid for use in simulation runs. In some cases, data reveals an obvious non-random pattern, as with so-called "runs in the data" (such as expecting random 0–9 but finding "4 3 2 1 0 4 3 2 1..." and rarely going above 4). If a selected set of data fails the tests, then parameters can be changed or other randomized data can be used which does pass the tests for randomness.

Contents

Background

The issue of randomness is an important philosophical and theoretical question. Tests for randomness can be used to determine whether a data set has a recognisable pattern, which would indicate that the process that generated it is significantly non-random. For the most part, statistical analysis has, in practice, been much more concerned with finding regularities in data as opposed to testing for randomness. Many "random number generators" in use today are defined by algorithms, and so are actually pseudo-random number generators. The sequences they produce are called pseudo-random sequences. These generators do not always generate sequences which are sufficiently random, but instead can produce sequences which contain patterns. For example, the infamous RANDU routine fails many randomness tests dramatically, including the spectral test.

Stephen Wolfram used randomness tests on the output of Rule 30 to examine its potential for generating random numbers, [1] though it was shown to have an effective key size far smaller than its actual size [2] and to perform poorly on a chi-squared test. [3] The use of an ill-conceived random number generator can put the validity of an experiment in doubt by violating statistical assumptions. Though there are commonly used statistical testing techniques such as NIST standards, Yongge Wang showed that NIST standards are not sufficient. Furthermore, Yongge Wang [4] designed statistical–distance–based and law–of–the–iterated–logarithm–based testing techniques. Using this technique, Yongge Wang and Tony Nicol [5] detected the weakness in commonly used pseudorandom generators such as the well known Debian version of OpenSSL pseudorandom generator which was fixed in 2008.

Specific tests for randomness

There have been a fairly small number of different types of (pseudo-)random number generators used in practice. They can be found in the list of random number generators, and have included:

These different generators have varying degrees of success in passing the accepted test suites. Several widely used generators fail the tests more or less badly, while other 'better' and prior generators (in the sense that they passed all current batteries of tests and they already existed) have been largely ignored.

There are many practical measures of randomness for a binary sequence. These include measures based on statistical tests, transforms, and complexity or a mixture of these. A well-known and widely used collection of tests was the Diehard Battery of Tests, introduced by Marsaglia; this was extended to the TestU01 suite by L'Ecuyer and Simard. The use of Hadamard transform to measure randomness was proposed by S. Kak and developed further by Phillips, Yuen, Hopkins, Beth and Dai, Mund, and Marsaglia and Zaman. [6]

Several of these tests, which are of linear complexity, provide spectral measures of randomness. T. Beth and Z-D. Dai purported to show that Kolmogorov complexity and linear complexity are practically the same, [7] although Y. Wang later showed their claims are incorrect. [8] Nevertheless, Wang also demonstrated that for Martin-Löf random sequences, the Kolmogorov complexity is essentially the same as linear complexity.

These practical tests make it possible to compare the randomness of strings. On probabilistic grounds, all strings of a given length have the same randomness. However different strings have a different Kolmogorov complexity. For example, consider the following two strings.

String 1: 0101010101010101010101010101010101010101010101010101010101010101
String 2: 1100100001100001110111101110110011111010010000100101011110010110

String 1 admits a short linguistic description: "32 repetitions of '01'". This description has 22 characters, and it can be efficiently constructed out of some basis sequences.[ clarification needed ] String 2 has no obvious simple description other than writing down the string itself, which has 64 characters,[ clarification needed ] and it has no comparably efficient basis function representation. Using linear Hadamard spectral tests (see Hadamard transform), the first of these sequences will be found to be of much less randomness than the second one, which agrees with intuition.

Notable software implementations

See also

Notes

  1. Wolfram, Stephen (2002). A New Kind of Science . Wolfram Media, Inc. pp.  975–976. ISBN   978-1-57955-008-0.
  2. Willi Meier; Othmar Staffelbach (1991). "Analysis of Pseudo Random Sequences Generated by Cellular Automata". Advances in Cryptology — EUROCRYPT '91. Lecture Notes in Computer Science. Vol. 547. pp. 186–199. doi:10.1007/3-540-46416-6_17. ISBN   978-3-540-54620-7.
  3. Moshe Sipper; Marco Tomassini (1996), "Generating parallel random number generators by cellular programming", International Journal of Modern Physics C, 7 (2): 181–190, Bibcode:1996IJMPC...7..181S, CiteSeerX   10.1.1.21.870 , doi:10.1142/S012918319600017X .
  4. Yongge Wang. On the Design of LIL Tests for (Pseudo) Random Generators and Some Experimental Results, http://webpages.uncc.edu/yonwang/, 2014
  5. Yongge Wang; Tony Nicol (2014), "Statistical Properties of Pseudo Random Sequences and Experiments with PHP and Debian OpenSSL", Esorics 2014, LNCS 8712: 454–471
  6. Terry Ritter, "Randomness tests: a literature survey", webpage: CBR-rand.
  7. Beth, T. and Z-D. Dai. 1989. On the Complexity of Pseudo-Random Sequences -- or: If You Can Describe a Sequence It Can't be Random. Advances in Cryptology – EUROCRYPT '89. 533-543. Springer-Verlag
  8. Yongge Wang 1999. Linear complexity versus pseudorandomness: on Beth and Dai's result. In: Proc. Asiacrypt 99, pages 288--298. LNCS 1716, Springer Verlag
  9. ENT: A Pseudorandom Number Sequence Test Program, Fourmilab, 2008.
  10. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, Special Publication 800-22 Revision 1a, National Institute of Standards and Technology, 2010.
  11. Implementation of the NIST Statistical Test Suite

Related Research Articles

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

<span class="mw-page-title-main">Linear congruential generator</span> Algorithm for generating pseudo-randomized numbers

A linear congruential generator (LCG) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation.

A Lagged Fibonacci generator is an example of a pseudorandom number generator. This class of random number generator is aimed at being an improvement on the 'standard' linear congruential generator. These are based on a generalisation of the Fibonacci sequence.

The concept of a random sequence is essential in probability theory and statistics. The concept generally relies on the notion of a sequence of random variables and many statistical discussions begin with the words "let X1,...,Xn be independent random variables...". Yet as D. H. Lehmer stated in 1951: "A random sequence is a vague notion... in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians".

<span class="mw-page-title-main">Hardware random number generator</span> Cryptographic device

In computing, a hardware random number generator (HRNG), true random number generator (TRNG), non-deterministic random bit generator (NRBG), or physical random number generator is a device that generates random numbers from a physical process capable of producing entropy, unlike the pseudorandom number generator that utilizes a deterministic algorithm and non-physical nondeterministic random bit generators that do not include hardware dedicated to generation of entropy.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely known as a cryptographic random number generator (CRNG).

In theoretical computer science and cryptography, a pseudorandom generator (PRG) for a class of statistical tests is a deterministic procedure that maps a random seed to a longer pseudorandom string such that no statistical test in the class can distinguish between the output of the generator and the uniform distribution. The random seed itself is typically a short binary string drawn from the uniform distribution.

A pseudorandom binary sequence (PRBS), pseudorandom binary code or pseudorandom bitstream is a binary sequence that, while generated with a deterministic algorithm, is difficult to predict and exhibits statistical behavior similar to a truly random sequence. PRBS generators are used in telecommunication, such as in analog-to-information conversion, but also in encryption, simulation, correlation technique and time-of-flight spectroscopy. The most common example is the maximum length sequence generated by a (maximal) linear feedback shift register (LFSR). Other examples are Gold sequences, Kasami sequences and JPL sequences, all based on LFSRs.

George Marsaglia was an American mathematician and computer scientist. He is best known for creating the diehard tests, a suite of software for measuring statistical randomness.

A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll or the digits of π exhibit statistical randomness.

A nonlinear-feedback shift register (NLFSR) is a shift register whose input bit is a non-linear function of its previous state.

<span class="mw-page-title-main">Random number generation</span> Producing a sequence that cannot be predicted better than by random chance

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators (HRNGs), wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact pre-determined—these generations can be reproduced simply by knowing the state of the PRNG.

Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as strings or any other data structure. In other words, it is shown within algorithmic information theory that computational incompressibility "mimics" (except for a constant that only depends on the chosen universal programming language) the relations or inequalities found in information theory. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

<span class="mw-page-title-main">Xorshift</span> Class of pseudorandom number generators

Xorshift random number generators, also called shift-register generators, are a class of pseudorandom number generators that were invented by George Marsaglia. They are a subset of linear-feedback shift registers (LFSRs) which allow a particularly efficient implementation in software without the excessive use of sparse polynomials. They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes execution extremely efficient on modern computer architectures, but it does not benefit efficiency in a hardware implementation. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period.

<span class="mw-page-title-main">Randomness</span> Apparent lack of pattern or predictability in events

In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual random events are, by definition, unpredictable, but if there is a known probability distribution, the frequency of different outcomes over repeated events is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will tend to occur twice as often as 4. In this view, randomness is not haphazardness; it is a measure of uncertainty of an outcome. Randomness applies to concepts of chance, probability, and information entropy.

<span class="mw-page-title-main">OpenPuff</span>

OpenPuff Steganography and Watermarking, sometimes abbreviated OpenPuff or Puff, is a free steganography tool for Microsoft Windows created by Cosimo Oliboni and still maintained as independent software. The program is notable for being the first steganography tool that:

Yongge Wang is a computer science professor at the University of North Carolina at Charlotte specialized in algorithmic complexity and cryptography. He is the inventor of IEEE P1363 cryptographic standards SRP5 and WANG-KE and has contributed to the mathematical theory of algorithmic randomness. He co-authored a paper demonstrating that a recursively enumerable real number is an algorithmically random sequence if and only if it is a Chaitin's constant for some encoding of programs. He also showed the separation of Schnorr randomness from recursive randomness. He also invented a distance based statistical testing technique to improve NIST SP800-22 testing in randomness tests. In cryptographic research, he is known for the invention of the quantum resistant random linear code based encryption scheme RLCE.

KISS (Keep it Simple Stupid) is a family of pseudorandom number generators introduced by George Marsaglia. Starting from 1998 Marsaglia posted on various newsgroups including sci.math, comp.lang.c, comp.lang.fortran and sci.stat.math several versions of the generators. All KISS generators combine three or four independent random number generators with a view to improving the quality of randomness. KISS generators produce 32-bit or 64-bit random integers, from which random floating-point numbers can be constructed if desired. The original 1993 generator is based on the combination of a linear congruential generator and of two linear feedback shift-register generators. It has a period 295, good speed and good statistical properties; however, it fails the LinearComplexity test in the Crush and BigCrush tests of the TestU01 suite. A newer version from 1999 is based on a linear congruential generator, a 3-shift linear feedback shift-register and two multiply-with-carry generators. It is 10–20% slower than the 1993 version but has a larger period 2123 and passes all tests in TestU01. In 2009 Marsaglia presented a version based on 64-bit integers (appropriate for 64-bit processors) which combines a multiply-with-carry generator, a Xorshift generator and a linear congruential generator. It has a period of around 2250 (around 1075).

xoroshiro128+ is a pseudorandom number generator intended as a successor to xorshift+. Instead of perpetuating Marsaglia's tradition of xorshift as a basic operation, xoroshiro128+ uses a shift/rotate-based linear transformation designed by Sebastiano Vigna in collaboration with David Blackman. The result is a significant improvement in speed and statistical quality.