Next-bit test

Last updated January 30, 2025

In cryptography and the theory of computation, the next-bit test^[1] is a test against pseudo-random number generators. We say that a sequence of bits passes the next bit test for at any position $i$ in the sequence, if any attacker who knows the $i$ first bits (but not the seed) cannot predict the $(i+1)$ st with reasonable computational power.

Precise statement(s)

Let $P$ be a polynomial, and $S=\{S_{k}\}$ be a collection of sets such that $S_{k}$ contains $P(k)$ -bit long sequences. Moreover, let $\mu _{k}$ be the probability distribution of the strings in $S_{k}$ .

We now define the next-bit test in two different ways.

Boolean circuit formulation

A predicting collection^[2] $C=\{C_{k}^{i}\}$ is a collection of boolean circuits, such that each circuit $C_{k}^{i}$ has less than $P_{C}(k)$ gates and exactly $i$ inputs. Let $p_{k,i}^{C}$ be the probability that, on input the $i$ first bits of $s$ , a string randomly selected in $S_{k}$ with probability $\mu _{k}(s)$ , the circuit correctly predicts $s_{i+1}$ , i.e. :

$p_{k,i}^{C}={\mathcal {P}}\left[C_{k}(s_{1}\ldots s_{i})=s_{i+1}\right|s\in S_{k}{\text{ with probability }}\mu _{k}(s)]$

Now, we say that $\{S_{k}\}_{k}$ passes the next-bit test if for any predicting collection $C$ , any polynomial $Q$ :

$p_{k,i}^{C}<{\frac {1}{2}}+{\frac {1}{Q(k)}}$

Probabilistic Turing machines

We can also define the next-bit test in terms of probabilistic Turing machines, although this definition is somewhat stronger (see Adleman's theorem). Let ${\mathcal {M}}$ be a probabilistic Turing machine, working in polynomial time. Let $p_{k,i}^{\mathcal {M}}$ be the probability that ${\mathcal {M}}$ predicts the $(i+1)$ st bit correctly, i.e.

$p_{k,i}^{\mathcal {M}}={\mathcal {P}}[M(s_{1}\ldots s_{i})=s_{i+1}|s\in S_{k}{\text{ with probability }}\mu _{k}(s)]$

We say that collection $S=\{S_{k}\}$ passes the next-bit test if for all polynomial $Q$ , for all but finitely many $k$ , for all $0<i<k$ :

$p_{k,i}^{\mathcal {M}}<{\frac {1}{2}}+{\frac {1}{Q(k)}}$

Completeness for Yao's test

The next-bit test is a particular case of Yao's test for random sequences, and passing it is therefore a necessary condition for passing Yao's test. However, it has also been shown a sufficient condition by Yao.^[1]

We prove it now in the case of the probabilistic Turing machine, since Adleman has already done the work of replacing randomization with non-uniformity in his theorem. The case of Boolean circuits cannot be derived from this case (since it involves deciding potentially undecidable problems), but the proof of Adleman's theorem can be easily adapted to the case of non-uniform Boolean circuit families.

Let ${\mathcal {M}}$ be a distinguisher for the probabilistic version of Yao's test, i.e. a probabilistic Turing machine, running in polynomial time, such that there is a polynomial $Q$ such that for infinitely many $k$

|p_{k,S}^{\mathcal {M}}-p_{k,U}^{\mathcal {M}}|\geq {\frac {1}{Q(k)}}

Let $R_{k,i}=\{s_{1}\ldots s_{i}u_{i+1}\ldots u_{P(k)}|s\in S_{k},u\in \{0,1\}^{P(k)}\}$ . We have: $R_{k,0}=\{0,1\}^{P(k)}$ and $R_{k,P(k)}=S_{k}$ . Then, we notice that $\sum _{i=0}^{P(k)}|p_{k,R_{k,i+1}}^{\mathcal {M}}-p_{k,R_{k,i}}^{\mathcal {M}}|\geq |p_{k,R_{k,P(k)}}^{\mathcal {M}}-p_{k,R_{k,0}}^{\mathcal {M}}|=|p_{k,S}^{\mathcal {M}}-p_{k,U}^{\mathcal {M}}|\geq {\frac {1}{Q(k)}}$ . Therefore, at least one of the $|p_{k,R_{k,i+1}}^{\mathcal {M}}-p_{k,R_{k,i}}^{\mathcal {M}}|$ should be no smaller than ${\frac {1}{Q(k)P(k)}}$ .

Next, we consider probability distributions $\mu _{k,i}$ and ${\overline {\mu _{k,i}}}$ on $R_{k,i}$ . Distribution $\mu _{k,i}$ is the probability distribution of choosing the $i$ first bits in $S_{k}$ with probability given by $\mu _{k}$ , and the $P(k)-i$ remaining bits uniformly at random. We have thus:

$\mu _{k,i}(w_{1}\ldots w_{P(k)})=\left(\sum _{s\in S_{k},s_{1}\ldots s_{i}=w_{1}\ldots w_{i}}\mu _{k}(s)\right)\left({\frac {1}{2}}\right)^{P(k)-i}$

${\overline {\mu _{k,i}}}(w_{1}\ldots w_{P(k)})=\left(\sum _{s\in S_{k},s_{1}\ldots s_{i-1}(1-s_{i})=w_{1}\ldots w_{i}}\mu _{k}(s)\right)\left({\frac {1}{2}}\right)^{P(k)-i}$

We thus have $\mu _{k,i}={\frac {1}{2}}(\mu _{k,i+1}+{\overline {\mu _{k,i+1}}})$ (a simple calculus trick shows this), thus distributions $\mu _{k,i+1}$ and ${\overline {\mu _{k,i+1}}}$ can be distinguished by ${\mathcal {M}}$ . Without loss of generality, we can assume that $p_{\mu _{k,i+1}}^{\mathcal {M}}-p_{\overline {\mu _{k,i+1}}}^{\mathcal {M}}\geq {\frac {1}{2}}+{\frac {1}{R(k)}}$ , with $R$ a polynomial.

This gives us a possible construction of a Turing machine solving the next-bit test: upon receiving the $i$ first bits of a sequence, ${\mathcal {N}}$ pads this input with a guess of bit $l$ and then $P(k)-i-1$ random bits, chosen with uniform probability. Then it runs ${\mathcal {M}}$ , and outputs $l$ if the result is $1$ , and $1-l$ else.

Related Research Articles

In particle physics, the electroweak interaction or electroweak force is the unified description of two of the fundamental interactions of nature: electromagnetism (electromagnetic interaction) and the weak interaction. Although these two forces appear very different at everyday low energies, the theory models them as two different aspects of the same force. Above the unification energy, on the order of 246 GeV, they would merge into a single force. Thus, if the temperature is high enough – approximately 10¹⁵ K – then the electromagnetic force and weak force merge into a combined electroweak force.

In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable $, which takes values in the set and is distributed according to, the entropy is where denotes the sum over the variable's possible values. The choice of base for, the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.$

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability theory and statistics, Student's $t$ distribution $is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.$

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of $L p$ spaces.

In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and returns another function.

In algebraic geometry, an affine algebraic set is the set of the common zeros over an algebraically closed field $k$ of some family of polynomials in the polynomial ring $An affine variety or affine algebraic variety, is an affine algebraic set such that the ideal generated by the defining polynomials is prime.$

In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used to justify results relating to the probability distributions of statistics that are used in the analysis of variance.

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides evaluation metrics for probabilistic predictions or forecasts. While "regular" loss functions assign a goodness-of-fit score to a predicted value and an observed value, scoring rules assign such a score to a predicted probability distribution and an observed value. On the other hand, a scoring function provides a summary measure for the evaluation of point predictions, i.e. one predicts a property or functional $, like the expectation or the median.$

In probability theory, the family of complex normal distributions, denoted $or, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix, and the relation matrix . The standard complex normal is the univariate distribution with,, and .$

In cryptography and the theory of computation, Yao's test is a test defined by Andrew Chi-Chih Yao in 1982, against pseudo-random sequences. A sequence of words passes Yao's test if an attacker with reasonable computational power cannot distinguish it from a sequence generated uniformly at random.

In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line $which consists of the real numbers and$

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

Pure inductive logic (PIL) is the area of mathematical logic concerned with the philosophical and mathematical foundations of probabilistic inductive reasoning. It combines classical predicate logic and probability theory. Probability values are assigned to sentences of a first-order relational language to represent degrees of belief that should be held by a rational agent. Conditional probability values represent degrees of belief based on the assumption of some received evidence.

References

1 2 Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.
↑ Manuel Blum and Silvio Micali, How to generate cryptographically strong sequences of pseudo-random bits, in SIAM J. COMPUT., Vol. 13, No. 4, November 1984

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[yao82-1] 1 2 Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, 1982.

[2] Manuel Blum and Silvio Micali, How to generate cryptographically strong sequences of pseudo-random bits, in SIAM J. COMPUT., Vol. 13, No. 4, November 1984

[1]

[2]