Piling-up lemma

Last updated October 30, 2023

In cryptanalysis, the piling-up lemma is a principle used in linear cryptanalysis to construct linear approximations to the action of block ciphers. It was introduced by Mitsuru Matsui (1993) as an analytical tool for linear cryptanalysis.^[1] The lemma states that the bias (deviation of the expected value from 1/2) of a linear Boolean function (XOR-clause) of independent binary random variables is related to the product of the input biases:^[2]

Interpretation

The lemma implies that XOR-ing independent binary variables always reduces the bias (or at least does not increase it); moreover, the output is unbiased if and only if there is at least one unbiased input variable.

Note that for two variables the quantity $I(X\oplus Y)$ is a correlation measure of $X$ and $Y$ , equal to $P(X=Y)-P(X\neq Y)$ ; $I(X)$ can be interpreted as the correlation of $X$ with $0$ .

Expected value formulation

The piling-up lemma can be expressed more naturally when the random variables take values in $\{-1,1\}$ . If we introduce variables $\chi _{i}=1-2X_{i}=(-1)^{X_{i}}$ (mapping 0 to 1 and 1 to -1) then, by inspection, the XOR-operation transforms to a product:

\chi _{1}\chi _{2}\cdots \chi _{n}=1-2(X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n})=(-1)^{X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n}}

and since the expected values are the imbalances, $E(\chi _{i})=I(X_{i})$ , the lemma now states:

E\left(\prod _{i=1}^{n}\chi _{i}\right)=\prod _{i=1}^{n}E(\chi _{i})

which is a known property of the expected value for independent variables.

For dependent variables the above formulation gains a (positive or negative) covariance term, thus the lemma does not hold. In fact, since two Bernoulli variables are independent if and only if they are uncorrelated (i.e. have zero covariance; see uncorrelatedness), we have the converse of the piling up lemma: if it does not hold, the variables are not independent (uncorrelated).

Boolean derivation

The piling-up lemma allows the cryptanalyst to determine the probability that the equality:

X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n}=0

holds, where the X's are binary variables (that is, bits: either 0 or 1).

Let P(A) denote "the probability that A is true". If it equals one, A is certain to happen, and if it equals zero, A cannot happen. First of all, we consider the piling-up lemma for two binary variables, where $P(X_{1}=0)=p_{1}$ and $P(X_{2}=0)=p_{2}$ .

Now, we consider:

P(X_{1}\oplus X_{2}=0)

Due to the properties of the xor operation, this is equivalent to

P(X_{1}=X_{2})

X₁ = X₂ = 0 and X₁ = X₂ = 1 are mutually exclusive events, so we can say

P(X_{1}=X_{2})=P(X_{1}=X_{2}=0)+P(X_{1}=X_{2}=1)=P(X_{1}=0,X_{2}=0)+P(X_{1}=1,X_{2}=1)

Now, we must make the central assumption of the piling-up lemma: the binary variables we are dealing with are independent ; that is, the state of one has no effect on the state of any of the others. Thus we can expand the probability function as follows:

$P(X_{1}\oplus X_{2}=0)$	$=P(X_{1}=0)P(X_{2}=0)+P(X_{1}=1)P(X_{2}=1)$
	$=p_{1}p_{2}+(1-p_{1})(1-p_{2})$
	$=p_{1}p_{2}+(1-p_{1}-p_{2}+p_{1}p_{2})$
	$=2p_{1}p_{2}-p_{1}-p_{2}+1$

Now we express the probabilities p₁ and p₂ as ½ + ε₁ and ½ + ε₂, where the ε's are the probability biases — the amount the probability deviates from ½.

$P(X_{1}\oplus X_{2}=0)$	$=2(1/2+\epsilon _{1})(1/2+\epsilon _{2})-(1/2+\epsilon _{1})-(1/2+\epsilon _{2})+1$
	$=1/2+\epsilon _{1}+\epsilon _{2}+2\epsilon _{1}\epsilon _{2}-1/2-\epsilon _{1}-1/2-\epsilon _{2}+1$
	$=1/2+2\epsilon _{1}\epsilon _{2}$

Thus the probability bias ε_1,2 for the XOR sum above is 2ε₁ε₂.

This formula can be extended to more X's as follows:

P(X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n}=0)=1/2+2^{n-1}\prod _{i=1}^{n}\epsilon _{i}

Note that if any of the ε's is zero; that is, one of the binary variables is unbiased, the entire probability function will be unbiased — equal to ½.

A related slightly different definition of the bias is $\epsilon _{i}=P(X_{i}=1)-P(X_{i}=0),$ in fact minus two times the previous value. The advantage is that now with

\varepsilon _{total}=P(X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n}=1)-P(X_{1}\oplus X_{2}\oplus \cdots \oplus X_{n}=0)

we have

\varepsilon _{total}=(-1)^{n+1}\prod _{i=1}^{n}\varepsilon _{i},

adding random variables amounts to multiplying their (2nd definition) biases.

Practice

In practice, the Xs are approximations to the S-boxes (substitution components) of block ciphers. Typically, X values are inputs to the S-box and Y values are the corresponding outputs. By simply looking at the S-boxes, the cryptanalyst can tell what the probability biases are. The trick is to find combinations of input and output values that have probabilities of zero or one. The closer the approximation is to zero or one, the more helpful the approximation is in linear cryptanalysis.

However, in practice, the binary variables are not independent, as is assumed in the derivation of the piling-up lemma. This consideration has to be kept in mind when applying the lemma; it is not an automatic cryptanalysis formula.

Related Research Articles

In cryptography, linear cryptanalysis is a general form of cryptanalysis based on finding affine approximations to the action of a cipher. Attacks have been developed for block ciphers and stream ciphers. Linear cryptanalysis is one of the two most widely used attacks on block ciphers; the other being differential cryptanalysis.

In probability theory and statistics, the chi-squared distribution with $degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.$

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

Pearson's chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900. In contexts where it is important to improve a distinction between the test statistic and its distribution, names similar to Pearson χ-squared test or statistic are used.

In mathematics, the tensor algebra of a vector space V, denoted T(V) or T^•(V), is the algebra of tensors on V (of any rank) with multiplication being the tensor product. It is the free algebra on V, in the sense of being left adjoint to the forgetful functor from algebras to vector spaces: it is the "most general" algebra containing V, in the sense of the corresponding universal property (see below).

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

<span class="mw-page-title-main">Boolean function</span> Function returning one of only two values

In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set. Alternative names are switching function, used especially in older computer science literature, and truth function, used in logic. Boolean functions are the subject of Boolean algebra and switching theory.

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

In mathematics, mollifiers are smooth functions with special properties, used for example in distribution theory to create sequences of smooth functions approximating nonsmooth (generalized) functions, via convolution. Intuitively, given a function which is rather irregular, by convolving it with a mollifier the function gets "mollified", that is, its sharp features are smoothed, while still remaining close to the original nonsmooth (generalized) function.

In statistics, a linear probability model (LPM) is a special case of a binary regression model. Here the dependent variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more explanatory variables. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by linear regression.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of $independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.$

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

In theoretical computer science, a small-bias sample space is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weak entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent from the source and uniformly distributed. Examples of weakly random sources include radioactive decay or thermal noise; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (TRNG); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

Turingery or Turing's method was a manual codebreaking method devised in July 1942 by the mathematician and cryptanalyst Alan Turing at the British Government Code and Cypher School at Bletchley Park during World War II. It was for use in cryptanalysis of the Lorenz cipher produced by the SZ40 and SZ42 teleprinter rotor stream cipher machines, one of the Germans' Geheimschreiber machines. The British codenamed non-Morse traffic "Fish", and that from this machine "Tunny".

In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on $or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.$

In graph theory, the graph removal lemma states that when a graph contains few copies of a given subgraph, then all of the copies can be eliminated by removing a small number of edges. The special case in which the subgraph is a triangle is known as the triangle removal lemma.

References

↑ Matsui, Mitsuru (1994). "Linear Cryptanalysis Method for DES Cipher". Advances in Cryptology – EUROCRYPT '93. Lecture Notes in Computer Science. Vol. 765. pp. 386–397. doi:10.1007/3-540-48285-7_33. ISBN 978-3-540-57600-6. S2CID 533517.
↑ Li, Qin; Boztaş, S. (December 2007). "Extended Linear Cryptanalysis and Extended Piling-up Lemma" (PDF). ISC Turkey. S2CID 5508314. Archived from the original (PDF) on 2017-01-17.
↑ The bias (and imbalance) may also be taken as an absolute value; if the bias with flipped sign (bias towards one) is used the lemma needs an additional (-1)^(n+1) sign factor in the right hand side.
↑ Harpes, Carlo; Kramer, Gerhard G.; Massey, James L. (1995). "A Generalization of Linear Cryptanalysis and the Applicability of Matsui's Piling-up Lemma". Advances in Cryptology – EUROCRYPT '95. Lecture Notes in Computer Science. Vol. 921. pp. 24–38. doi:10.1007/3-540-49264-X_3. ISBN 978-3-540-59409-3.
↑ Kukorelly, Zsolt (1999). "The Piling-Up Lemma and Dependent Random Variables". Cryptography and Coding. Lecture Notes in Computer Science. Vol. 1746. pp. 186–190. doi:10.1007/3-540-46665-7_22. ISBN 978-3-540-66887-9.
↑ Nyberg, Kaisa (February 26, 2008). "Linear Cryptanalysis (Cryptology lecture)" (PDF). Helsinki University of Technology, Laboratory for Theoretical Computer Science.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Matsui, Mitsuru (1994). "Linear Cryptanalysis Method for DES Cipher". Advances in Cryptology – EUROCRYPT '93. Lecture Notes in Computer Science. Vol. 765. pp. 386–397. doi:10.1007/3-540-48285-7_33. ISBN 978-3-540-57600-6. S2CID 533517.

[2] Li, Qin; Boztaş, S. (December 2007). "Extended Linear Cryptanalysis and Extended Piling-up Lemma" (PDF). ISC Turkey. S2CID 5508314. Archived from the original (PDF) on 2017-01-17.

[3] The bias (and imbalance) may also be taken as an absolute value; if the bias with flipped sign (bias towards one) is used the lemma needs an additional (-1)^(n+1) sign factor in the right hand side.

[4] Harpes, Carlo; Kramer, Gerhard G.; Massey, James L. (1995). "A Generalization of Linear Cryptanalysis and the Applicability of Matsui's Piling-up Lemma". Advances in Cryptology – EUROCRYPT '95. Lecture Notes in Computer Science. Vol. 921. pp. 24–38. doi:10.1007/3-540-49264-X_3. ISBN 978-3-540-59409-3.

[5] Kukorelly, Zsolt (1999). "The Piling-Up Lemma and Dependent Random Variables". Cryptography and Coding. Lecture Notes in Computer Science. Vol. 1746. pp. 186–190. doi:10.1007/3-540-46665-7_22. ISBN 978-3-540-66887-9.

[6] Nyberg, Kaisa (February 26, 2008). "Linear Cryptanalysis (Cryptology lecture)" (PDF). Helsinki University of Technology, Laboratory for Theoretical Computer Science.

[1]

[2]

[3]

[4]

[5]

[6]