Correlation attack

Last updated

Correlation attacks are a class of cryptographic known-plaintext attacks for breaking stream ciphers whose keystream s are generated by combining the output of several linear-feedback shift registers (LFSRs) using a Boolean function. Correlation attacks exploit a statistical weakness that arises from the specific Boolean function chosen for the keystream. While some Boolean functions are vulnerable to correlation attacks, stream ciphers generated using such functions are not inherently insecure.

Contents

Explanation

Correlation attacks become possible when a significant correlation exists between the output state of an individual LFSR in the keystream generator and the output of the Boolean function that combines the output states of all the LFSRs. These attacks are employed in combination with partial knowledge of the keystream, which is derived from partial knowledge of the plaintext. The two are then compared using an XOR logic gate. This vulnerability allows an attacker to brute-force the key for the individual LFSR and the rest of the system separately. For instance, in a keystream generator where four 8-bit LFSRs are combined to produce the keystream, and if one of the registers is correlated to the Boolean function output, it becomes possible to brute force it first, followed by the remaining three LFSRs. As a result, the total attack complexity becomes 28 + 224.

Compared to the cost of launching a brute-force attack on the entire system, with complexity 232, this represents an attack effort saving factor of just under 256. If a second register is correlated with the function, the process may be repeated and decrease the attack complexity down to 28 + 28 + 216 for an effort saving factor of just under 65028.

Example

Geffe generator

One example is the Geffe generator, which consists of three LFSRs: LFSR-1, LFSR-2, and LFSR-3. Let these registers be denoted as: , , and , respectively. Then, the Boolean function combining the three registers to provide the generator output is given by (i.e. ( AND ) XOR (NOT AND )). There are 23 = 8 possible values for the outputs of the three registers, and the value of this combining function for each of them is shown in the table below:

Boolean function output table
0000
0011
0100
0111
1000
1010
1101
1111

Consider the output of the third register, . The table above shows that of the 8 possible outputs of , 6 are equal to the corresponding value of the generator output, . In 75% of all possible cases, . Thus LFSR-3 is 'correlated' with the generator. This is a weakness that may be exploited as follows:

An interception can be made on the cipher text of a plain text which has been encrypted by a stream cipher using a Geffe generator as its keystream generator, i.e. for , where is the output of LFSR-1 at time , etc. It's also possible that part of the plain text, e.g. , the first 32 bits of the plaintext (corresponding to 4 ASCII characters of text). This is not entirely improbable considering plain text is a valid XML file, for instance, the first 4 ASCII characters must be "<xml". Similarly, many file formats or network protocols have very standard headers or footers. Given the intercepted and our known/guessed , we may easily find for by XOR-ing the two together. This makes the 32 consecutive bits of the generator output easy to determine.

This enables a brute-force search of the space of possible keys (initial values) for LFSR-3 (assuming we know the tapped bits of LFSR-3, an assumption which is in line with Kerckhoffs' principle). For any given key in the key space, we may quickly generate the first 32 bits of LFSR-3's output and compare these to our recovered 32 bits of the entire generator's output. Because we have established earlier that there is a 75% correlation between the output of LFSR-3 and the generator, we know we have correctly guessed the key for LFSR-3 if approximately 24 of the first 32 bits of LFSR-3 output will match up with the corresponding bits of generator output. If we have guessed incorrectly, we should expect roughly half, or 16, of the first 32 bits of these two sequences to match. Thus we may recover the key for LFSR-3 independently of the keys of LFSR-1 and LFSR-2. At this stage we have reduced the problem of brute forcing a system of 3 LFSRs to the problem of brute forcing a single LFSR and then a system of 2 LFSRs. The amount of effort saved here depends on the length of the LFSRs. For realistic values, it is a very substantial saving and can make brute-force attacks very practical.

Observe in the table above that also agrees with the generator output 6 times out of 8, again a correlation of 75% correlation between and the generator output. We may begin a brute force attack against LFSR-2 independently of the keys of LFSR-1 and LFSR-3, leaving only LFSR-1 unbroken. Thus, we are able to break the Geffe generator with as much effort as required to brute force 3 entirely independent LFSRs. This means the Geffe generator is a very weak generator and should never be used to generate stream cipher keystreams.

Note from the table above that agrees with the generator output 4 times out of 8—a 50% correlation. We cannot use this to brute force LFSR-1 independently of the others: the correct key will yield output that agrees with the generator output 50% of the time, but on average so will an incorrect key. This represents the ideal situation from a security perspective—the combining function should be chosen so the correlation between each variable and the combining function's output is as close as possible to 50%. In practice, it may be difficult to find a function that achieves this without sacrificing other design criteria, e.g., period length, so a compromise may be necessary.

Clarifying the statistical nature of the attack

While the above example illustrates well the relatively simple concepts behind correlation attacks, it perhaps simplifies the explanation of precisely how the brute forcing of individual LFSRs proceeds. Incorrectly guessed keys will generate LFSR output that agrees with the generator output roughly 50% of the time because, given two random bit sequences of a given length, the probability of agreement between the sequences at any particular bit is 0.5. However, specific individual incorrect keys may well generate LFSR output that agrees with the generator output more or less often than exactly 50% of the time. This is particularly salient in the case of LFSRs whose correlation with the generator is not especially strong; for small enough correlations, it is certainly not outside the realm of possibility that an incorrectly guessed key will also lead to LFSR output that agrees with the desired number of bits of the generator output. Thus, it may not be possible to identify the unique key to that LFSR. It may be possible to identify a number of potential keys, however, which is still a significant breach of the cipher's security. Moreover, given a megabyte of known plain text, the situation would be substantially different. An incorrect key may generate LFSR output that agrees with more than 512 kilobytes of the generator output but is not likely to generate output that agrees with as much as 768 kilobytes of the generator output as a correctly guessed key would. As a rule, the weaker the correlation between an individual register and the generator output, the more known plain text is required to find that register's key with a high degree of confidence. Estimates of the length of known plain text required for a given correlation can be calculated using the binomial distribution.

Higher order correlations

Definition

The correlations which were exploited in the example attack on the Geoff generator are examples of what are called first order correlations: they are correlations between the value of the generator output and an individual LFSR. It is possible to define higher-order correlations in addition to these. For instance, it may be possible that while a given Boolean function has no strong correlations with any of the individual registers it combines, a significant correlation may exist between some Boolean function of two of the registers, e.g., . This would be an example of a second order correlation. Third order correlations and higher can be defined in this way.

Higher-order correlation attacks can be more powerful than single-order correlation attacks, however, this effect is subject to a "law of limiting returns". The table below shows a measure of the computational cost for various attacks on a keystream generator consisting of eight 8-bit LFSRs combined by a single Boolean function. Understanding the calculation of cost is relatively straightforward: the leftmost term of the sum represents the size of the key space for the correlated generators, and the rightmost term represents the size of the key space for the remaining generators.

Generator attack effort
AttackEffort (size of keyspace)
Brute force
Single 1st order correlation attack
Single 2nd order correlation attack
Single 3rd order correlation attack
Single 4th order correlation attack
Single 5th order correlation attack
Single 6th order correlation attack
Single 7th order correlation attack

While higher-order correlations lead to more powerful attacks, they are also more difficult to find, as the space of available Boolean functions to correlate against the generator output increases as the number of arguments to the function does.

Terminology

A Boolean function of n variables is said to be "m-th order correlation immune", or to have "mth order correlation immunity" for some integer m, if no significant correlation exists between the function's output and any Boolean function of m of its inputs. For example, a Boolean function that has no first-order or second-order correlations, but which does have a third-order correlation exhibits 2nd order correlation immunity. Obviously, higher correlation immunity makes a function more suitable for use in a keystream generator (although this is not the only thing that needs to be considered).

Siegenthaler showed that the correlation immunity m of a Boolean function of algebraic degree d of n variables satisfies ; for a given set of input variables, this means that a high algebraic degree will restrict the maximum possible correlation immunity. Furthermore, if the function is balanced then . [1]

It follows that it is impossible for a function of n variables to be nth order correlation immune. This also follows from the fact that any such function can be written using a Reed-Muller basis as a combination of XORs of the input functions.

Cipher design implications

Given the probable extreme severity of a correlation attack's impact on a stream cipher's security, it should be essential to test a candidate Boolean combination function for correlation immunity before deciding to use it in a stream cipher. However, it is important to note that high correlation immunity is a necessary, but not sufficient condition for a Boolean function to be appropriate for use in a keystream generator. There are other issues to consider, for example, whether or not the function is balanced - whether it outputs as many or roughly as many 1's as it does 0's when all possible inputs are considered.

Research has been conducted into methods for easily generating Boolean functions of a given size which are guaranteed to have at least some particular order of correlation immunity. This research has uncovered links between correlation immune Boolean functions and error correcting codes. [2]

See also

Related Research Articles

In cryptography, a block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption.

Differential cryptanalysis is a general form of cryptanalysis applicable primarily to block ciphers, but also to stream ciphers and cryptographic hash functions. In the broadest sense, it is the study of how differences in information input can affect the resultant difference at the output. In the case of a block cipher, it refers to a set of techniques for tracing differences through the network of transformation, discovering where the cipher exhibits non-random behavior, and exploiting such properties to recover the secret key.

In cryptography, RC4 is a stream cipher. While it is remarkable for its simplicity and speed in software, multiple vulnerabilities have been discovered in RC4, rendering it insecure. It is especially vulnerable when the beginning of the output keystream is not discarded, or when nonrandom or related keys are used. Particularly problematic uses of RC4 have led to very insecure protocols such as WEP.

<span class="mw-page-title-main">Stream cipher</span> Type of symmetric key cipher

A stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream (keystream). In a stream cipher, each plaintext digit is encrypted one at a time with the corresponding digit of the keystream, to give a digit of the ciphertext stream. Since encryption of each digit is dependent on the current state of the cipher, it is also known as state cipher. In practice, a digit is typically a bit and the combining operation is an exclusive-or (XOR).

In computing, a linear-feedback shift register (LFSR) is a shift register whose input bit is a linear function of its previous state.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely known as a cryptographic random number generator (CRNG).

A5/1 is a stream cipher used to provide over-the-air communication privacy in the GSM cellular telephone standard. It is one of several implementations of the A5 security protocol. It was initially kept secret, but became public knowledge through leaks and reverse engineering. A number of serious weaknesses in the cipher have been identified.

<span class="mw-page-title-main">Boolean function</span> Function returning one of only two values

In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set. Alternative names are switching function, used especially in older computer science literature, and truth function, used in logic. Boolean functions are the subject of Boolean algebra and switching theory.

E0 is a stream cipher used in the Bluetooth protocol. It generates a sequence of pseudorandom numbers and combines it with the data using the XOR operator. The key length may vary, but is generally 128 bits.

A nonlinear-feedback shift register (NLFSR) is a shift register whose input bit is a non-linear function of its previous state.

Grain is a stream cipher submitted to eSTREAM in 2004 by Martin Hell, Thomas Johansson and Willi Meier. It has been selected for the final eSTREAM portfolio for Profile 2 by the eSTREAM project. Grain is designed primarily for restricted hardware environments. It accepts an 80-bit key and a 64-bit IV. The specifications do not recommend a maximum length of output per pair. A number of potential weaknesses in the cipher have been identified and corrected in Grain 128a which is now the recommended cipher to use for hardware environments providing both 128bit security and authentication.

In mathematics, the correlation immunity of a Boolean function is a measure of the degree to which its outputs are uncorrelated with some subset of its inputs. Specifically, a Boolean function is said to be correlation-immune of order m if every subset of m or fewer variables in is statistically independent of the value of .

In cryptography, Achterbahn is the name of a synchronous stream cipher algorithm submitted to the eSTREAM Project of the eCRYPT network. In the final specification the cipher is called ACHTERBAHN-128/80, because it supports the key lengths of 80 bits and 128 bits, respectively. Achterbahn was developed by Berndt Gammel, Rainer Göttfert and Oliver Kniffler. Achterbahn means rollercoaster, though a literal translation of the term would be eight-track, which indicates that the cipher can encrypt eight bit streams in parallel.

In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. Modern symmetric-key ciphers are specifically designed to be immune to such an attack. In other words, modern encryption schemes are pseudorandom permutations and are designed to have ciphertext indistinguishability. If an algorithm is found that can distinguish the output from random faster than a brute force search, then that is considered a break of the cipher.

In cryptography, an alternating step generator (ASG) is a cryptographic pseudorandom number generator used in stream ciphers, based on three linear-feedback shift registers. Its output is a combination of two LFSRs which are stepped (clocked) in an alternating fashion, depending on the output of a third LFSR.

The summation generator, created in 1985, by Rainer Rueppel, was a cryptography and security front-runner in the late 1980s. It operates by taking the output of two LFSRs through an adder with carry. The operation's strength is that it is nonlinear. However, through the early 1990s various attacks against the summation generator eventually led to its fall to a correlation attack. In 1995 Klapper and Goresky were able to determine the summation generator's sequence in only 219 bits.

<span class="mw-page-title-main">Bent function</span> Special type of Boolean function

In the mathematical field of combinatorics, a bent function is a Boolean function that is maximally non-linear; it is as different as possible from the set of all linear and affine functions when measured by Hamming distance between truth tables. Concretely, this means the maximum correlation between the output of the function and a linear function is minimal. In addition, the derivatives of a bent function are balanced Boolean functions, so for any change in the input variables there is a 50 percent chance that the output value will change.

The Grain 128a stream cipher was first purposed at Symmetric Key Encryption Workshop (SKEW) in 2011 as an improvement of the predecessor Grain 128, which added security enhancements and optional message authentication using the Encrypt & MAC approach. One of the important features of the Grain family is that the throughput can be increased at the expense of additional hardware. Grain 128a is designed by Martin Ågren, Martin Hell, Thomas Johansson and Willi Meier.

<span class="mw-page-title-main">Simon (cipher)</span> Family of lightweight block ciphers

Simon is a family of lightweight block ciphers publicly released by the National Security Agency (NSA) in June 2013. Simon has been optimized for performance in hardware implementations, while its sister algorithm, Speck, has been optimized for software implementations.

A time/memory/data tradeoff attack is a type of cryptographic attack where an attacker tries to achieve a situation similar to the space–time tradeoff but with the additional parameter of data, representing the amount of data available to the attacker. An attacker balances or reduces one or two of those parameters in favor of the other one or two. This type of attack is very difficult, so most of the ciphers and encryption schemes in use were not designed to resist it.

References

  1. T. Siegenthaler (September 1984). "Correlation-Immunity of Nonlinear Combining Functions for Cryptographic Applications". IEEE Transactions on Information Theory. 30 (5): 776–780. doi:10.1109/TIT.1984.1056949.
  2. Chuan-Kun Wu and Ed Dawson, Construction of Correlation Immune Boolean Functions Archived 2006-09-07 at the Wayback Machine , ICICS97