A permuted congruential generator (PCG) is a pseudorandom number generation algorithm developed in 2014 by Dr. M.E. O'Neill which applies an output permutation function to improve the statistical properties of a modulo-2n linear congruential generator (LCG). It achieves excellent statistical performance [1] [2] [3] [4] with small and fast code, and small state size. [5]
LCGs with a power-of-2 modulus are simple, efficient, and have uniformly distributed binary outputs, but suffer from a well-known problem of short periods in the low-order bits. [5] : 31–34
A PCG addresses this by adding an output transformation between the LCG state and the PCG output. This adds two elements to the LCG:
The variable rotation ensures that all output bits depend on the most-significant bit of state, so all output bits have full period.
The PCG family includes a number of variants. The core LCG is defined for widths from 8 to 128 bits[ citation needed ], although only 64 and 128 bits are recommended for practical use; smaller sizes are for statistical tests of the technique.
The additive constant in the LCG can be varied to produce different streams. The constant is an arbitrary odd integer, [6] so it does not need to be stored explicitly; the address of the state variable itself (with the low bit set) can be used.
There are several different output transformations defined. All perform well, but some have a larger margin than others. [5] : 39 They are built from the following components:
x ^= x >> constant
. The constant is chosen to be half of the bits (rounded down) not discarded by the fllowing RR or RS operation.Each of these operations is either invertible (and thus one-to-one) or a truncation (and thus 2k-to-one for some fixed k), so their composition maps the same fixed number of input states to each output value. This preserves the equidistribution of the underlying LCG.
These are combined into the following recommended output transformations, illustrated here in their most common sizes:
count = (int)(x >> 59); x ^= x >> 18; return rotr32((uint32_t)(x >> 27), count);
.count = (int)(x >> 61); x ^= x >> 22; return (uint32_t)(x >> (29 - count));
.count = (int)(x >> 122); x64 = (uint64_t)(x ^ (x >> 64)); return rotr64(x64, count);
count=(int)(x >> 28); x ^= x >> (4 + count); x *= 277803737u; return x ^ (x >> 22);
count=(int)(x >> 59); x ^= x >> (5 + count); x *= 12605985483714917081u; return x ^ (x >> 43);
count = (int)(x >> 122); low64 = rotr64((uint64_t)(x ^ (x >> 64)), count); high64 = rotr64((uint64_t)(x >> 64), low64 & 63); return (uint128_t)high64 << 64 | low64;
Finally, if a generator period longer than 2128 is required, the generator can be extended with an array of sub-generators. One is chosen (in rotation) to be added to the main generator's output, and every time the main generator's state reaches zero, the sub-generators are cycled in a pattern which provides a period equal to 2 to the power of the total state size.
The generator recommended for most users [5] : 43 is PCG-XSH-RR with 64-bit state and 32-bit output. It can be implemented as:
#include<stdint.h>staticuint64_tstate=0x4d595df4d0f33173;// Or something seed-dependentstaticuint64_tconstmultiplier=6364136223846793005u;staticuint64_tconstincrement=1442695040888963407u;// Or an arbitrary odd constantstaticuint32_trotr32(uint32_tx,unsignedr){returnx>>r|x<<(-r&31);}uint32_tpcg32(void){uint64_tx=state;unsignedcount=(unsigned)(x>>59);// 59 = 64 - 5state=x*multiplier+increment;x^=x>>18;// 18 = (64 - 27)/2returnrotr32((uint32_t)(x>>27),count);// 27 = 32 - 5}voidpcg32_init(uint64_tseed){state=seed+increment;(void)pcg32();}
The generator applies the output transformation to the initial state rather than the final state in order to increase the available instruction-level parallelism to maximize performance on modern superscalar processors. [5] : 43
A slightly faster version eliminates the increment, reducing the LCG to a multiplicative (Lehmer-style) generator with a period of only 262, and uses the weaker XSH-RS output function:
staticuint64_tmcg_state=0xcafef00dd15ea5e5u;// Must be oddstaticuint64_tconstmultiplier=6364136223846793005u;uint32_tpcg32_fast(void){uint64_tx=mcg_state;unsignedcount=(unsigned)(x>>61);// 61 = 64 - 3mcg_state=x*multiplier;x^=x>>22;return(uint32_t)(x>>(22+count));// 22 = 32 - 3 - 7}voidpcg32_fast_init(uint64_tseed){mcg_state=2*seed+1;(void)pcg32_fast();}
The time saving is minimal, as the most expensive operation (the 64×64-bit multiply) remains, so the normal version is preferred except in extremis. Still, this faster version also passes statistical tests. [4]
When executing on a 32-bit processor, the 64×64-bit multiply must be implemented using three 32×32→64-bit multiply operations. To reduce that to two, there are 32-bit multipliers which perform almost as well as the 64-bit one, such as 0xf13283ad [6] , 0xffffffff0e703b65 or 0xf2fc5985.
O'Neill proposes testing PRNGs by applying statistical tests to their reduced-size variants and determining the minimum number of internal state bits required to pass. [7] TestU01's BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state; [8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications.
PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32()
above) requires 39, and PCG-XSH-RS (pcg32_fast()
above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state, [5] : 19 and Mersenne twister fails despite 19937 bits of state. [9]
It has been shown that it is practically possible (with a large computation) to recover the seed of the pseudo-random generator given 512 consecutive output bytes. [10] This implies that it is practically possible to predict the rest of the pseudo-random stream given 512 bytes.