Xorshift random number generators, also called shift-register generators, are a class of pseudorandom number generators that were invented by George Marsaglia. [1] They are a subset of linear-feedback shift registers (LFSRs) which allow a particularly efficient implementation in software without the excessive use of sparse polynomials. [2] They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes execution extremely efficient on modern computer architectures, but it does not benefit efficiency in a hardware implementation. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period. [3]
For execution in software, xorshift generators are among the fastest PRNGs, requiring very small code and state. However, they do not pass every statistical test without further refinement. This weakness is amended by combining them with a non-linear function, as described in the original paper. Because plain xorshift generators (without a non-linear step) fail some statistical tests, they have been accused of being unreliable. [3] : 360
A C version [lower-alpha 1] of three xorshift algorithms [1] : 4,5 is given here. The first has one 32-bit word of state, and period 232−1. The second has one 64-bit word of state and period 264−1. The last one has four 32-bit words of state, and period 2128−1. The 128-bit algorithm passes the diehard tests. However, it fails the MatrixRank and LinearComp tests of the BigCrush test suite from the TestU01 framework.
All use three shifts and three or four exclusive-or operations:
#include<stdint.h>structxorshift32_state{uint32_ta;};/* The state must be initialized to non-zero */uint32_txorshift32(structxorshift32_state*state){/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */uint32_tx=state->a;x^=x<<13;x^=x>>17;x^=x<<5;returnstate->a=x;}structxorshift64_state{uint64_ta;};uint64_txorshift64(structxorshift64_state*state){uint64_tx=state->a;x^=x<<13;x^=x>>7;x^=x<<17;returnstate->a=x;}/* struct xorshift128_state can alternatively be defined as a pair of uint64_t or a uint128_t where supported */structxorshift128_state{uint32_tx[4];};/* The state must be initialized to non-zero */uint32_txorshift128(structxorshift128_state*state){/* Algorithm "xor128" from p. 5 of Marsaglia, "Xorshift RNGs" */uint32_tt=state->x[3];uint32_ts=state->x[0];/* Perform a contrived 32-bit shift. */state->x[3]=state->x[2];state->x[2]=state->x[1];state->x[1]=s;t^=t<<11;t^=t>>8;returnstate->x[0]=t^s^(s>>19);}
In case of one 64-bit word of state, there exist parameters which hold period 264−1 with two pair of exclusive-or and shift. [4]
#include<stdint.h>structxorshift64_state{uint64_ta;};uint64_txorshift64(structxorshift64_state*state){uint64_tx=state->a;x^=x<<7;x^=x>>9;returnstate->a=x;}
All xorshift generators fail some tests in the BigCrush test suite. This is true for all generators based on linear recurrences, such as the Mersenne Twister or WELL. However, it is easy to scramble the output of such generators to improve their quality.
The scramblers known as + and * still leave weakness in the low bits, [5] so they are intended for floating point use, where the lowest bits of floating-point numbers have a smaller impact on the interpreted value. [6] For general purpose, the scrambler ** (pronounced starstar) makes the LFSR generators pass in all bits.
Marsaglia suggested scrambling the output by combining it with a simple additive counter modulo 232 (which he calls a "Weyl sequence" after Weyl's equidistribution theorem). This also increases the period by a factor of 232, to 2192−232:
#include<stdint.h>structxorwow_state{uint32_tx[5];uint32_tcounter;};/* The state array must be initialized to not be all zero in the first four words */uint32_txorwow(structxorwow_state*state){/* Algorithm "xorwow" from p. 5 of Marsaglia, "Xorshift RNGs" */uint32_tt=state->x[4];uint32_ts=state->x[0];/* Perform a contrived 32-bit rotate. */state->x[4]=state->x[3];state->x[3]=state->x[2];state->x[2]=state->x[1];state->x[1]=s;t^=t>>2;t^=t<<1;t^=s^(s<<4);state->x[0]=t;state->counter+=362437;returnt+state->counter;}
This performs well, but fails a few tests in BigCrush. [7] This generator is the default in Nvidia's CUDA toolkit. [8]
An xorshift* generator applies an invertible multiplication (modulo the word size) as a non-linear transformation to the output of an xorshift generator, as suggested by Marsaglia. [1] All xorshift* generators emit a sequence of values that is equidistributed in the maximum possible dimension (except that they will never output zero for 16 calls, i.e. 128 bytes, in a row). [9]
The following 64-bit generator has a maximal period of 264−1. [9]
#include<stdint.h>/* xorshift64s, variant A_1(12,25,27) with multiplier M_32 from line 3 of table 5 */uint64_txorshift64star(void){/* initial seed must be nonzero, don't use a static variable for the state if multithreaded */staticuint64_tx=1;x^=x>>12;x^=x<<25;x^=x>>27;returnx*0x2545F4914F6CDD1DULL;}
The generator fails only the MatrixRank test of BigCrush, however if the generator is modified to return only the high 32 bits, then it passes BigCrush with zero failures. [10] : 7 In fact, a reduced version with only 40 bits of internal state passes the suite, suggesting a large safety margin. [10] : 19 A similar generator suggested in Numerical Recipes [11] as RanQ1
also fails the BirthdaySpacings test.
Vigna [9] suggests the following xorshift1024* generator with 1024 bits of state and a maximal period of 21024−1; however, it does not always pass BigCrush. [5] xoshiro256** is therefore a much better option.
#include<stdint.h>/* The state must be seeded so that there is at least one non-zero element in array */structxorshift1024s_state{uint64_tx[16];intindex;};uint64_txorshift1024s(structxorshift1024s_state*state){intindex=state->index;uint64_tconsts=state->x[index++];uint64_tt=state->x[index&=15];t^=t<<31;// at^=t>>11;// b -- Again, the shifts and the multipliers are tunablet^=s^(s>>30);// cstate->x[index]=t;state->index=index;returnt*1181783497276652981ULL;}
An xorshift+ generator can achieve an order of magnitude fewer failures than Mersenne Twister or WELL. A native C implementation of an xorshift+ generator that passes all tests from the BigCrush suite can typically generate a random number in fewer than 10 clock cycles on x86, thanks to instruction pipelining. [12]
Rather than using multiplication, it is possible to use addition as a faster non-linear transformation. The idea was first proposed by Saito and Matsumoto (also responsible for the Mersenne Twister) in the XSadd generator, which adds two consecutive outputs of an underlying xorshift generator based on 32-bit shifts. [13] However, one disadvantage of adding consecutive outputs is that, while the underlying xorshift128 generator is 2-dimensionally equidistributed, the xorshift128+ generator is only 1-dimensionally equidistributed. [14]
XSadd has some weakness in the low-order bits of its output; it fails several BigCrush tests when the output words are bit-reversed. To correct this problem, Vigna introduced the xorshift+ family, [14] based on 64-bit shifts. xorshift+ generators, even as large as xorshift1024+, exhibit some detectable linearity in the low-order bits of their output; [5] it passes BigCrush, but doesn't when the 32 lowest-order bits are used in reverse order from each 64-bit word. [5] This generator is one of the fastest generators passing BigCrush. [12]
The following xorshift128+ generator uses 128 bits of state and has a maximal period of 2128−1.
#include<stdint.h>structxorshift128p_state{uint64_tx[2];};/* The state must be seeded so that it is not all zero */uint64_txorshift128p(structxorshift128p_state*state){uint64_tt=state->x[0];uint64_tconsts=state->x[1];state->x[0]=s;t^=t<<23;// at^=t>>18;// b -- Again, the shifts and the multipliers are tunablet^=s^(s>>5);// cstate->x[1]=t;returnt+s;}
xorshiftr+ (r stands for reduced; reads "xorshifter plus") generator was mainly based on xorshift+ yet incorporates modifications making it significantly faster (especially on lightweight devices) and more successful in randomness tests (including TestU01 BigCrush suite) compared to its predecessors. [15] It is one of the fastest generators passing all tests in TestU01's BigCrush suite. Like xorshift+, a native C implementation of an xorshiftr+ generator that passes all tests from the BigCrush suite can typically generate a random number in fewer than 10 clock cycles on x86, thanks to instruction pipelining. [12] [15]
Unlike xorshift+, xorshiftr+ does not return the sum of two variables derived from the state using xorshift-style steps, rather it returns a single variable with the very last operation in its cycle; however, it features an addition just before returning a value, namely in the phase of adjusting the seed for the next cycle; hence the "+" in the name of the algorithm. The variable sizes, including the state, can be increased with no compromise to the randomness scores, but performance drops may be observed on lightweight devices.
The following xorshiftr128+ generator uses 128 bits of state (with two variables) and has a maximal period of 2128−1.
#include<stdint.h>structxorshiftr128plus_state{uint64_ts[2];// seeds};/* The state must be seeded so that it is not all zero */uint64_txorshiftr128plus(structxorshiftr128plus_state*state){uint64_tx=state->s[0];uint64_tconsty=state->s[1];state->s[0]=y;x^=x<<23;// shift & xorx^=x>>17;// shift & xorx^=y;// xorstate->s[1]=x+y;returnx;}
xoshiro (short for "xor, shift, rotate") and xoroshiro (short for "xor, rotate, shift, rotate") use rotations in addition to shifts. According to Vigna, they are faster and produce better quality output than xorshift. [16] [17]
This class of generator has variants for 32-bit and 64-bit integer and floating point output; for floating point, one takes the upper 53 bits (for binary64) or the upper 23 bits (for binary32), since the upper bits are of better quality than the lower bits in the floating point generators. The algorithms also include a jump
function, which sets the state forward by some number of steps – usually a power of two that allows many threads of execution to start at distinct initial states.
For 32-bit output, xoshiro128** and xoshiro128+ are exactly equivalent to xoshiro256** and xoshiro256+, with uint32_t in place of uint64_t, and with different shift/rotate constants.
More recently, the xoshiro++ generators have been made as an alternative to the xoshiro** generators. They are used in some implementations of Fortran compilers such as GNU Fortran, Java, and Julia. [18]
xoshiro256++ is the family's general-purpose random 64-bit number generator.
/* Adapted from the code included on Sebastiano Vigna's website */#include<stdint.h>uint64_trol64(uint64_tx,intk){return(x<<k)|(x>>(64-k));}structxoshiro256pp_state{uint64_ts[4];};uint64_txoshiro256pp(structxoshiro256pp_state*state){uint64_t*s=state->s;uint64_tconstresult=rol64(s[0]+s[3],23)+s[0];uint64_tconstt=s[1]<<17;s[2]^=s[0];s[3]^=s[1];s[1]^=s[2];s[0]^=s[3];s[2]^=t;s[3]=rol64(s[3],45);returnresult;}
xoshiro256** uses multiplication rather than addition in its output function. It is worth noting, however, that the output function is invertible, allowing the underlying state to be trivially uncovered. [19] It is used in GNU Fortran compiler, Lua (as of Lua 5.4), and the .NET framework (as of .NET 6.0). [18]
/* Adapted from the code included on Sebastiano Vigna's website */#include<stdint.h>uint64_trol64(uint64_tx,intk){return(x<<k)|(x>>(64-k));}structxoshiro256ss_state{uint64_ts[4];};uint64_txoshiro256ss(structxoshiro256ss_state*state){uint64_t*s=state->s;uint64_tconstresult=rol64(s[1]*5,7)*9;uint64_tconstt=s[1]<<17;s[2]^=s[0];s[3]^=s[1];s[1]^=s[2];s[0]^=s[3];s[2]^=t;s[3]=rol64(s[3],45);returnresult;}
xoshiro256+ is approximately 15% faster than xoshiro256**, but the lowest three bits have low linear complexity; therefore, it should be used only for floating point results by extracting the upper 53 bits.
#include<stdint.h>uint64_trol64(uint64_tx,intk){return(x<<k)|(x>>(64-k));}structxoshiro256p_state{uint64_ts[4];};uint64_txoshiro256p(structxoshiro256p_state*state){uint64_t*s=state->s;uint64_tconstresult=s[0]+s[3];uint64_tconstt=s[1]<<17;s[2]^=s[0];s[3]^=s[1];s[1]^=s[2];s[0]^=s[3];s[2]^=t;s[3]=rol64(s[3],45);returnresult;}
If space is at a premium, xoroshiro128** and xoroshiro128+ are equivalent to xoshiro256** and xoshiro256+. These have smaller state spaces, and thus are less useful for massively parallel programs. xoroshiro128+ also exhibits a mild dependency in the population count, generating a failure after 5 TB of output. The authors do not believe that this can be detected in real world programs. Instead of perpetuating Marsaglia's tradition of xorshift as a basic operation, xoroshiro128+ uses a shift/rotate-based linear transformation designed by Sebastiano Vigna in collaboration with David Blackman. The result is a significant improvement in speed and statistical quality. [20]
xoroshiro64** and xoroshiro64* are equivalent to xoroshiro128** and xoroshiro128+. Unlike the xoshiro generators, they are not straightforward ports of their higher-precision counterparts.
The lowest bits of the output generated by xoroshiro128+ have low quality. The authors of xoroshiro128+ acknowledge that it does not pass all statistical tests, stating
This is xoroshiro128+ 1.0, our best and fastest small-state generator for floating-point numbers. We suggest to use its upper bits for floating-point generation, as it is slightly faster than xoroshiro128**. It passes all tests we are aware of except for the four lower bits, which might fail linearity tests (and just those), so if low linear complexity is not considered an issue (as it is usually the case) it can be used to generate 64-bit outputs, too; moreover, this generator has a very mild Hamming-weight dependency making our test (http://prng.di.unimi.it/hwd.php) fail after 5 TB of output; we believe this slight bias cannot affect any application. If you are concerned, use xoroshiro128** or xoshiro256+.
We suggest to use a sign test to extract a random Boolean value, and right shifts to extract subsets of bits.
The state must be seeded so that it is not everywhere zero. If you have a 64-bit seed, we suggest to seed a splitmix64 generator and use its output to fill s.
NOTE: the parameters (a=24, b=16, c=37) of this version give slightly
better results in our test than the 2016 version (a=55, b=14, c=36). [21]
These claims about not passing tests can be confirmed by running PractRand on the input, resulting in output like that shown below:
RNG_test using PractRand version 0.93 RNG = RNG_stdin64, seed = 0xfac83126 test set = normal, folding = standard (64 bit) rng=RNG_stdin64, seed=0xfac83126 length= 128 megabytes (2^27 bytes), time= 2.1 seconds Test Name Raw Processed Evaluation [Low1/64]BRank(12):256(2) R= +3748 p~= 3e-1129 FAIL !!!!!!!! [Low1/64]BRank(12):384(1) R= +5405 p~= 3e-1628 FAIL !!!!!!!! ...and 146 test result(s) without anomalies
Acknowledging the authors go on to say:
We suggest to use a sign test to extract a random Boolean value [21]
Thus, programmers should prefer the highest bits (e.g., making a heads/tails by writing random_number < 0
rather than random_number & 1
). It must be noted, though, that the same test is failed by some instances of the Mersenne Twister and WELL.
The statistical problems extend far beyond the bottom few bits, because it fails the PractRand test even when truncated [22] and fails multiple tests in BigCrush even when the bits are reversed. [23]
In the xoshiro paper, it is recommended to initialize the state of the generators using a generator which is radically different from the initialized generators, as well as one which will never give the "all-zero" state; for shift-register generators, this state is impossible to escape from. [17] [24] The authors specifically recommend using the SplitMix64 generator, from a 64-bit seed, as follows:
#include<stdint.h>structsplitmix64_state{uint64_ts;};uint64_tsplitmix64(structsplitmix64_state*state){uint64_tresult=(state->s+=0x9E3779B97f4A7C15);result=(result^(result>>30))*0xBF58476D1CE4E5B9;result=(result^(result>>27))*0x94D049BB133111EB;returnresult^(result>>31);}structxorshift128_state{uint32_tx[4];};// one could do the same for any of the other generatorsvoidxorshift128_init(structxorshift128_state*state,uint64_tseed){structsplitmix64_statesmstate={seed};uint64_ttmp=splitmix64(&smstate);state->x[0]=(uint32_t)tmp;state->x[1]=(uint32_t)(tmp>>32);tmp=splitmix64(&smstate);state->x[2]=(uint32_t)tmp;state->x[3]=(uint32_t)(tmp>>32);}
^
represents bitwise XOR, and <<
and >>
represent bitwise shifts.A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.
A linear congruential generator (LCG) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation.
The Mersenne Twister is a general-purpose pseudorandom number generator (PRNG) developed in 1997 by Makoto Matsumoto and Takuji Nishimura. Its name derives from the choice of a Mersenne prime as its period length.
A Lagged Fibonacci generator is an example of a pseudorandom number generator. This class of random number generator is aimed at being an improvement on the 'standard' linear congruential generator. These are based on a generalisation of the Fibonacci sequence.
In computing, a linear-feedback shift register (LFSR) is a shift register whose input bit is a linear function of its previous state.
In cryptography, the shrinking generator is a form of pseudorandom number generator intended to be used in a stream cipher. It was published in Crypto 1993 by Don Coppersmith, Hugo Krawczyk and Yishay Mansour.
Serpent is a symmetric key block cipher that was a finalist in the Advanced Encryption Standard (AES) contest, in which it ranked second to Rijndael. Serpent was designed by Ross Anderson, Eli Biham, and Lars Knudsen.
Red Pike is a classified United Kingdom government encryption algorithm, proposed for use by the National Health Service by GCHQ, but designed for a "broad range of applications in the British government" Archived 2004-04-23 at the Wayback Machine. Little is publicly known about Red Pike, except that it is a block cipher with a 64-bit block size and 64-bit key length. According to the academic study of the cipher cited below and quoted in a paper by Ross Anderson and Markus Kuhn, it "uses the same basic operations as RC5" and "has no look-up tables, virtually no key schedule and requires only five lines of code"; "the influence of each key bit quickly cascades" and "each encryption involves of the order of 100 operations". 64 bits of key entropy are not considered secure anymore.
In Unix-like operating systems, /dev/random and /dev/urandom are special files that serve as cryptographically secure pseudorandom number generators (CSPRNGs). They allow access to a CSPRNG that is seeded with entropy from environmental noise, collected from device drivers and other sources. /dev/random typically blocked if there was less entropy available than requested; more recently it usually blocks at startup until sufficient entropy has been gathered, then unblocks permanently. The /dev/urandom device typically was never a blocking device, even if the pseudorandom number generator seed was not fully initialized with entropy since boot. Not all operating systems implement the same methods for /dev/random and /dev/urandom.
The security of cryptographic systems depends on some secret data that is known to authorized persons but unknown and unpredictable to others. To achieve this unpredictability, some randomization is typically employed. Modern cryptographic protocols often require frequent generation of random quantities. Cryptographic attacks that subvert or exploit weaknesses in this process are known as random number generator attacks.
Salsa20 and the closely related ChaCha are stream ciphers developed by Daniel J. Bernstein. Salsa20, the original cipher, was designed in 2005, then later submitted to the eSTREAM European Union cryptographic validation process by Bernstein. ChaCha is a modification of Salsa20 published in 2008. It uses a new round function that increases diffusion and increases performance on some architectures.
CryptGenRandom is a deprecated cryptographically secure pseudorandom number generator function that is included in Microsoft CryptoAPI. In Win32 programs, Microsoft recommends its use anywhere random number generation is needed. A 2007 paper from Hebrew University suggested security problems in the Windows 2000 implementation of CryptGenRandom. Microsoft later acknowledged that the same problems exist in Windows XP, but not in Vista. Microsoft released a fix for the bug with Windows XP Service Pack 3 in mid-2008.
In computer science, multiply-with-carry (MWC) is a method invented by George Marsaglia for generating sequences of random integers based on an initial set from two to many thousands of randomly chosen seed values. The main advantages of the MWC method are that it invokes simple computer integer arithmetic and leads to very fast generation of sequences of random numbers with immense periods, ranging from around to .
The Lehmer random number generator, sometimes also referred to as the Park–Miller random number generator, is a type of linear congruential generator (LCG) that operates in multiplicative group of integers modulo n. The general formula is
RDRAND
is an instruction for returning random numbers from an Intel on-chip hardware random number generator which has been seeded by an on-chip entropy source. It is also known as Intel Secure Key Technology, codenamed Bull Mountain. Intel introduced the feature around 2012, and AMD added support for the instruction in June 2015.
TestU01 is a software library, implemented in the ANSI C language, that offers a collection of utilities for the empirical randomness testing of random number generators (RNGs). The library was first introduced in 2007 by Pierre L’Ecuyer and Richard Simard of the Université de Montréal.
The spectral test is a statistical test for the quality of a class of pseudorandom number generators (PRNGs), the linear congruential generators (LCGs). LCGs have a property that when plotted in 2 or more dimensions, lines or hyperplanes will form, on which all possible outputs can be found. The spectral test compares the distance between these planes; the further apart they are, the worse the generator is. As this test is devised to study the lattice structures of LCGs, it can not be applied to other families of PRNGs.
KISS (Keep it Simple Stupid) is a family of pseudorandom number generators introduced by George Marsaglia. Starting from 1998 Marsaglia posted on various newsgroups including sci.math, comp.lang.c, comp.lang.fortran and sci.stat.math several versions of the generators. All KISS generators combine three or four independent random number generators with a view to improving the quality of randomness. KISS generators produce 32-bit or 64-bit random integers, from which random floating-point numbers can be constructed if desired. The original 1993 generator is based on the combination of a linear congruential generator and of two linear feedback shift-register generators. It has a period 295, good speed and good statistical properties; however, it fails the LinearComplexity test in the Crush and BigCrush tests of the TestU01 suite. A newer version from 1999 is based on a linear congruential generator, a 3-shift linear feedback shift-register and two multiply-with-carry generators. It is 10–20% slower than the 1993 version but has a larger period 2123 and passes all tests in TestU01. In 2009 Marsaglia presented a version based on 64-bit integers (appropriate for 64-bit processors) which combines a multiply-with-carry generator, a Xorshift generator and a linear congruential generator. It has a period of around 2250 (around 1075).
A permuted congruential generator (PCG) is a pseudorandom number generation algorithm developed in 2014 by Dr. M.E. O'Neill which applies an output permutation function to improve the statistical properties of a modulo-2n linear congruential generator. It achieves excellent statistical performance with small and fast code, and small state size.
We report that these scrambled generators systematically fail Big Crush—specifically the linear-complexity and matrix-rank tests that detect linearity—when taking the 32 lowest-order bits in reverse order from each 64-bit word.