Algorithmically random sequence

Last updated

Intuitively, an algorithmically random sequence (or random sequence ) is a sequence of binary digits that appears random to any algorithm running on a (prefix-free or not) universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet (e.g. decimal digits). Random sequences are key objects of study in algorithmic information theory.

Contents

In measure-theoretic probability theory, introduced by Andrey Kolmogorov in 1933, there is no such thing as a random sequence. For example, consider flipping a fair coin infinitely many times. Any particular sequence, be it or , has equal probability of exactly zero. There is no way to state that one sequence is "more random" than another sequence, using the language of measure-theoretic probability. However, it is intuitively obvious that looks more random than . Algorithmic randomness theory formalizes this intuition.

As different types of algorithms are sometimes considered, ranging from algorithms with specific bounds on their running time to algorithms which may ask questions of an oracle machine, there are different notions of randomness. The most common of these is known as Martin-Löf randomness (K-randomness or 1-randomness), but stronger and weaker forms of randomness also exist. When the term "algorithmically random" is used to refer to a particular single (finite or infinite) sequence without clarification, it is usually taken to mean "incompressible" or, in the case the sequence is infinite and prefix algorithmically random (i.e., K-incompressible), "Martin-Löf–Chaitin random".

Since its inception, Martin-Löf randomness has been shown to admit many equivalent characterizations—in terms of compression, randomness tests, and gambling—that bear little outward resemblance to the original definition, but each of which satisfy our intuitive notion of properties that random sequences ought to have: random sequences should be incompressible, they should pass statistical tests for randomness, and it should be difficult to make money betting on them. The existence of these multiple definitions of Martin-Löf randomness, and the stability of these definitions under different models of computation, give evidence that Martin-Löf randomness is a fundamental property of mathematics and not an accident of Martin-Löf's particular model.

It is important to disambiguate between algorithmic randomness and stochastic randomness. Unlike algorithmic randomness, which is defined for computable (and thus deterministic) processes, stochastic randomness is usually said to be a property of a sequence that is a priori known to be generated by (or is the outcome of) an independent identically distributed equiprobable stochastic process.

Because infinite sequences of binary digits can be identified with real numbers in the unit interval, random binary sequences are often called (algorithmically) random real numbers. Additionally, infinite binary sequences correspond to characteristic functions of sets of natural numbers; therefore those sequences might be seen as sets of natural numbers.

The class of all Martin-Löf random (binary) sequences is denoted by RAND or MLR.

History

Richard von Mises

Richard von Mises formalized the notion of a test for randomness in order to define a random sequence as one that passed all tests for randomness. He defined a "collective" (kollektiv) to be an infinite binary string defined such that

To pick out a subsequence, first pick a binary function , such that given any binary string , it outputs either 0 or 1. If it outputs 1, then we add to the subsequence, else we continue. In this definition, some admissible rules might abstain forever on some sequences, and thus fail to pick out an infinite subsequence. We only consider those that do pick an infinite subsequence.

Stated in another way, each infinite binary string is a coin-flip game, and an admissible rule is a way for a gambler to decide when to place bets. A collective is a coin-flip game where there is no way for one gambler to do better than another over the long run. That is, there is no gambling system that works for the game.

The definition generalizes from binary alphabet to countable alphabet:

Usually the admissible rules are defined to be rules computable by a Turing machine, and we require . With this, we have the Mises–Wald–Church random sequences. This is not a restriction, since given a sequence with , we can construct random sequences with any other computable . [1] Here, "Church" refers to Alonzo Church, whose 1940 paper proposed using Turing-computable rules. [2]

Theorem. (Abraham Wald, 1936, 1937) [3] If there are only countably many admissible rules, then almost any sequence is a collective.

Proof sketch: Use measure-theoretic probability.

Fix one admissible rule. Sample a random sequence from Bernoulli space. With probability 1 (use martingales), the subsequence picked by the admissible rule still has . Now add all the countably many rules. With probability 1, each subsequence picked by each rule still has .

Counterexample. (Jean Ville, 1939) [4] If there are only countably many admissible rules, then there exists a collective with for all .

Proof: See. [5]

Intuitively, the long-time average of a random sequence should oscillate on both sides of , like how a random walk should cross the origin infinitely many times. The counterexample suggests that the von Mises definition is not strong enough.

Per Martin-Löf

The Ville counterexample suggests that the Mises–Wald–Church sense of randomness is not good enough, because some random sequences do not satisfy some laws of randomness. For example, the Ville counterexample does not satisfy one of the laws of the iterated logarithm:

Naively, one can fix this by requiring a sequence to satisfy all possible laws of randomness, where a "law of randomness" is a property that is satisfied by all sequences with probability 1. However, for each infinite sequence , we have a law of randomness that , leading to the conclusion that there are no random sequences.

(Per Martin-Löf, 1966) [6] defined "Martin-Löf randomness" by only allowing laws of randomness that are Turing-computable. In other words, a sequence is random iff it passes all Turing-computable tests of randomness.

The thesis that the definition of Martin-Löf randomness "correctly" captures the intuitive notion of randomness has been called the Martin-Löf–Chaitin Thesis; it is somewhat similar to the Church–Turing thesis. [7]

Martin-Löf–Chaitin Thesis. The mathematical concept of "Martin-Löf randomness" captures the intuitive notion of an infinite sequence being "random". Church–Turing thesis. The mathematical concept of "computable by Turing machines" captures the intuitive notion of a function being "computable".

Like how Turing-computability has many equivalent definitions, Martin-Löf randomness also has many equivalent definitions. See next section.

Three equivalent definitions

Martin-Löf's original definition of a random sequence was in terms of constructive null covers; he defined a sequence to be random if it is not contained in any such cover. Gregory Chaitin, Leonid Levin and Claus-Peter Schnorr proved a characterization in terms of algorithmic complexity: a sequence is random if there is a uniform bound on the compressibility of its initial segments. Schnorr gave a third equivalent definition in terms of martingales. Li and Vitanyi's book An Introduction to Kolmogorov Complexity and Its Applications is the standard introduction to these ideas.

An infinite sequence S is Martin-Löf random if and only if there is a constant c such that all of S's finite prefixes are c-incompressible. More succinctly, .
A sequence is defined to be Martin-Löf random if it is not contained in any set determined by a constructive null cover.
  1. for all positive integers t,
A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it.

Interpretations of the definitions

The Kolmogorov complexity characterization conveys the intuition that a random sequence is incompressible: no prefix can be produced by a program much shorter than the prefix.

The null cover characterization conveys the intuition that a random real number should not have any property that is "uncommon". Each measure 0 set can be thought of as an uncommon property. It is not possible for a sequence to lie in no measure 0 sets, because each one-point set has measure 0. Martin-Löf's idea was to limit the definition to measure 0 sets that are effectively describable; the definition of an effective null cover determines a countable collection of effectively describable measure 0 sets and defines a sequence to be random if it does not lie in any of these particular measure 0 sets. Since the union of a countable collection of measure 0 sets has measure 0, this definition immediately leads to the theorem that there is a measure 1 set of random sequences. Note that if we identify the Cantor space of binary sequences with the interval [0,1] of real numbers, the measure on Cantor space agrees with Lebesgue measure.

An effective measure 0 set can be interpreted as a Turing machine that is able to tell, given an infinite binary string, whether the string looks random at levels of statistical significance. The set is the intersection of shrinking sets , and since each set is specified by an enumerable sequence of prefixes, given any infinite binary string, if it is in , then the Turing machine can decide in finite time that the string does fall inside . Therefore, it can "reject the hypothesis that the string is random at significance level ". If the Turing machine can reject the hypothesis at all significance levels, then the string is not random. A random string is one that, for each Turing-computable test of randomness, manages to remain forever un-rejected at some significance level. [8]

The martingale characterization conveys the intuition that no effective procedure should be able to make money betting against a random sequence. A martingale d is a betting strategy. d reads a finite string w and bets money on the next bit. It bets some fraction of its money that the next bit will be 0, and then remainder of its money that the next bit will be 1. d doubles the money it placed on the bit that actually occurred, and it loses the rest. d(w) is the amount of money it has after seeing the string w. Since the bet placed after seeing the string w can be calculated from the values d(w), d(w0), and d(w1), calculating the amount of money it has is equivalent to calculating the bet. The martingale characterization says that no betting strategy implementable by any computer (even in the weak sense of constructive strategies, which are not necessarily computable) can make money betting on a random sequence.

Properties and examples of Martin-Löf random sequences

Universality

There is a universal constructive martingale d. This martingale is universal in the sense that, given any constructive martingale d, if d succeeds on a sequence, then d succeeds on that sequence as well. Thus, d succeeds on every sequence in RANDc (but, since d is constructive, it succeeds on no sequence in RAND). (Schnorr 1971)

There is a constructive null cover of RANDc. This means that all effective tests for randomness (that is, constructive null covers) are, in a sense, subsumed by this universal test for randomness, since any sequence that passes this single test for randomness will pass all tests for randomness. (Martin-Löf 1966) Intuitively, this universal test for randomness says "If the sequence has increasingly long prefixes that can be increasingly well-compressed on this universal Turing machine", then it is not random." -- see next section.

Construction sketch: Enumerate the effective null covers as . The enumeration is also effective (enumerated by a modified universal Turing machine). Now we have a universal effective null cover by diagonalization: .

Passing randomness tests

If a sequence fails an algorithmic randomness test, then it is algorithmically compressible. Conversely, if it is algorithmically compressible, then it fails an algorithmic randomness test.

Construction sketch: Suppose the sequence fails a randomness test, then it can be compressed by lexicographically enumerating all sequences that fails the test, then code for the location of the sequence in the list of all such sequences. This is called "enumerative source encoding". [9]

Conversely, if the sequence is compressible, then by the pigeonhole principle, only a vanishingly small fraction of sequences are like that, so we can define a new test for randomness by "has a compression by this universal Turing machine". Incidentally, this is the universal test for randomness.

For example, consider a binary sequence sampled IID from the Bernoulli distribution. After taking a large number of samples, we should have about ones. We can code for this sequence as "Generate all binary sequences with length , and ones. Of those, the -th sequence in lexicographic order.".

By Stirling approximation, where is the binary entropy function. Thus, the number of bits in this description is:

The first term is for prefix-coding the numbers and . The second term is for prefix-coding the number . (Use Elias omega coding.) The third term is for prefix-coding the rest of the description.

When is large, this description has just bits, and so it is compressible, with compression ratio . In particular, the compression ratio is exactly one (incompressible) only when . (Example 14.2.8 [10] )

Impossibility of a gambling system

If a roulette table generates an algorithmically random sequence, then there is no way to beat the dealer. If it is not, then there is a way. 13-02-27-spielbank-wiesbaden-by-RalfR-093.jpg
If a roulette table generates an algorithmically random sequence, then there is no way to beat the dealer. If it is not, then there is a way.

Consider a casino offering fair odds at a roulette table. The roulette table generates a sequence of random numbers. If this sequence is algorithmically random, then there is no lower semi-computable strategy to win, which in turn implies that there is no computable strategy to win. That is, for any gambling algorithm, the long-term log-payoff is zero (neither positive nor negative). Conversely, if this sequence is not algorithmically random, then there is a lower semi-computable strategy to win.

Examples

Relation to the arithmetic hierarchy

Relative randomness

As each of the equivalent definitions of a Martin-Löf random sequence is based on what is computable by some Turing machine, one can naturally ask what is computable by a Turing oracle machine. For a fixed oracle A, a sequence B which is not only random but in fact, satisfies the equivalent definitions for computability relative to A (e.g., no martingale which is constructive relative to the oracle A succeeds on B) is said to be random relative to A. Two sequences, while themselves random, may contain very similar information, and therefore neither will be random relative to the other. Any time there is a Turing reduction from one sequence to another, the second sequence cannot be random relative to the first, just as computable sequences are themselves nonrandom; in particular, this means that Chaitin's Ω is not random relative to the halting problem.

An important result relating to relative randomness is van Lambalgen's theorem, which states that if C is the sequence composed from A and B by interleaving the first bit of A, the first bit of B, the second bit of A, the second bit of B, and so on, then C is algorithmically random if and only if A is algorithmically random, and B is algorithmically random relative to A. A closely related consequence is that if A and B are both random themselves, then A is random relative to B if and only if B is random relative to A.

Stronger than Martin-Löf randomness

Relative randomness gives us the first notion which is stronger than Martin-Löf randomness, which is randomness relative to some fixed oracle A. For any oracle, this is at least as strong, and for most oracles, it is strictly stronger, since there will be Martin-Löf random sequences which are not random relative to the oracle A. Important oracles often considered are the halting problem, , and the nth jump oracle, , as these oracles are able to answer specific questions which naturally arise. A sequence which is random relative to the oracle is called n-random; a sequence is 1-random, therefore, if and only if it is Martin-Löf random. A sequence which is n-random for every n is called arithmetically random. The n-random sequences sometimes arise when considering more complicated properties. For example, there are only countably many sets, so one might think that these should be non-random. However, the halting probability Ω is and 1-random; it is only after 2-randomness is reached that it is impossible for a random set to be .

Weaker than Martin-Löf randomness

Additionally, there are several notions of randomness which are weaker than Martin-Löf randomness. Some of these are weak 1-randomness, Schnorr randomness, computable randomness, partial computable randomness. Yongge Wang showed [11] that Schnorr randomness is different from computable randomness. Additionally, Kolmogorov–Loveland randomness is known to be no stronger than Martin-Löf randomness, but it is not known whether it is actually weaker.

At the opposite end of the randomness spectrum there is the notion of a K-trivial set. These sets are anti-random in that all initial segment is logarithmically compressible (i.e., for each initial segment w), but they are not computable.

See also

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.

<span class="mw-page-title-main">Computable number</span> Real number that can be computed within arbitrary precision

In mathematics, computable numbers are the real numbers that can be computed to within any desired precision by a finite, terminating algorithm. They are also known as the recursive numbers, effective numbers or the computable reals or recursive reals. The concept of a computable real number was introduced by Emile Borel in 1912, using the intuitive notion of computability available at the time.

In the philosophy of mathematics, constructivism asserts that it is necessary to find a specific example of a mathematical object in order to prove that an example exists. Contrastingly, in classical mathematics, one can prove the existence of a mathematical object without "finding" that object explicitly, by assuming its non-existence and then deriving a contradiction from that assumption. Such a proof by contradiction might be called non-constructive, and a constructivist might reject it. The constructive viewpoint involves a verificational interpretation of the existential quantifier, which is at odds with its classical interpretation.

The concept of a random sequence is essential in probability theory and statistics. The concept generally relies on the notion of a sequence of random variables and many statistical discussions begin with the words "let X1,...,Xn be independent random variables...". Yet as D. H. Lehmer stated in 1951: "A random sequence is a vague notion... in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians".

In theoretical computer science, the busy beaver game aims at finding a terminating program of a given size that produces the most output possible. Since an endlessly looping program producing infinite output or running for infinite time is easily conceived, such programs are excluded from the game. This is sometimes also formulated as finding the machine that runs for the longest time, but both games are similarly difficult.

Hypercomputation or super-Turing computation is a set of hypothetical models of computation that can provide outputs that are not Turing-computable. For example, a machine that could solve the halting problem would be a hypercomputer; so too would one that could correctly evaluate every statement in Peano arithmetic.

In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to be normal in base b if, for every positive integer n, all possible strings n digits long have density bn.

<span class="mw-page-title-main">Algorithmic probability</span>

In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs.

In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x1, ..., xN has a low discrepancy.

Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do the job of the function, i.e. given an input of the function domain it can return the corresponding output. Computable functions are used to discuss computability without referring to any concrete model of computation such as Turing machines or register machines. Any definition, however, must make reference to some specific model of computation but all valid definitions yield the same class of functions. Particular models of computability that give rise to the set of computable functions are the Turing-computable functions and the general recursive functions.

<span class="mw-page-title-main">Per Martin-Löf</span> Swedish logician, philosopher, and mathematical statistician

Per Erik Rutger Martin-Löf is a Swedish logician, philosopher, and mathematical statistician. He is internationally renowned for his work on the foundations of probability, statistics, mathematical logic, and computer science. Since the late 1970s, Martin-Löf's publications have been mainly in logic. In philosophical logic, Martin-Löf has wrestled with the philosophy of logical consequence and judgment, partly inspired by the work of Brentano, Frege, and Husserl. In mathematical logic, Martin-Löf has been active in developing intuitionistic type theory as a constructive foundation of mathematics; Martin-Löf's work on type theory has influenced computer science.

Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as strings or any other data structure. In other words, it is shown within algorithmic information theory that computational incompressibility "mimics" (except for a constant that only depends on the chosen universal programming language) the relations or inequalities found in information theory. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

Lutz's resource-bounded measure is a generalisation of Lebesgue measure to complexity classes. It was originally developed by Jack Lutz. Just as Lebesgue measure gives a method to quantify the size of subsets of the Euclidean space , resource bounded measure gives a method to classify the size of subsets of complexity classes.

In mathematics, effective dimension is a modification of Hausdorff dimension and other fractal dimensions that places it in a computability theory setting. There are several variations of which the most common is effective Hausdorff dimension. Dimension, in mathematics, is a particular way of describing the size of an object. Hausdorff dimension generalizes the well-known integer dimensions assigned to points, lines, planes, etc. by allowing one to distinguish between objects of intermediate size between these integer-dimensional objects. For example, fractal subsets of the plane may have intermediate dimension between 1 and 2, as they are "larger" than lines or curves, and yet "smaller" than filled circles or rectangles. Effective dimension modifies Hausdorff dimension by requiring that objects with small effective dimension be not only small but also locatable in a computable sense. As such, objects with large Hausdorff dimension also have large effective dimension, and objects with small effective dimension have small Hausdorff dimension, but an object can have small Hausdorff but large effective dimension. An example is an algorithmically random point on a line, which has Hausdorff dimension 0 but effective dimension 1.

In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. The halting problem is undecidable, meaning that no general algorithm exists that solves the halting problem for all possible program–input pairs.

In mathematics and computer science, computable analysis is the study of mathematical analysis from the perspective of computability theory. It is concerned with the parts of real analysis and functional analysis that can be carried out in a computable manner. The field is closely related to constructive analysis and numerical analysis.

In mathematics, a set of natural numbers is called a K-trivial set if its initial segments viewed as binary strings are easy to describe: the prefix-free Kolmogorov complexity is as low as possible, close to that of a computable set. Solovay proved in 1975 that a set can be K-trivial without being computable.

Universality probability is an abstruse probability measure in computational complexity theory that concerns universal Turing machines.

In mathematics, the incompressibility method is a proof method like the probabilistic method, the counting method or the pigeonhole principle. To prove that an object in a certain class satisfies a certain property, select an object of that class that is incompressible. If it does not satisfy the property, it can be compressed by computable coding. Since it can be generally proven that almost all objects in a given class are incompressible, the argument demonstrates that almost all objects in the class have the property involved. To select an incompressible object is ineffective, and cannot be done by a computer program. However, a simple counting argument usually shows that almost all objects of a given class can be compressed by only a few bits.

References

  1. Li, Ming; Vitányi, P. M. (2019). "1.9 Randomness". An introduction to Kolmogorov complexity and its applications (Fourth ed.). Cham: Springer. ISBN   978-3-030-11298-1.
  2. Copeland, Arthur H. (June 1940). "Alonzo Church. On the concept of a random sequence. Bulletin of the American Mathematical Society, vol. 46 (1940), pp. 130–135". The Journal of Symbolic Logic (Review). 5 (2): 71–72. doi:10.2307/2266178. ISSN   0022-4812. JSTOR   2266178. S2CID   124646586.
  3. Wald, A. (1936). Sur la notion de collectif dans la calcul des probabilités. Comptes Rendus des Seances de l'Académie des Sciences, 202, 180–183. Wald, A. (1937). Die Wiederspruchsfreiheit des Kollektivbegriffes der Wahrscheinlichkeitsrech- nung. Ergebnisse eines Mathematischen Kolloquiums, 8, 38–72
  4. Ville, J. (1939). Étude critique de la notion de collectif , Monographies des Probabilités, Calcul des Probabilités et ses Applications, Gauthier-Villars.
  5. Lieb, Elliott H.; Osherson, Daniel; Weinstein, Scott (2006-07-11). "Elementary Proof of a Theorem of Jean Ville". arXiv: cs/0607054 .
  6. Martin-Löf, Per (1966-12-01). "The definition of random sequences". Information and Control. 9 (6): 602–619. doi:10.1016/S0019-9958(66)80018-9. ISSN   0019-9958.
  7. Jean-Paul Delahaye, Randomness, Unpredictability and Absence of Order, in Philosophy of Probability, p. 145–167, Springer 1993.
  8. Li, Vitányi, Section 2.4
  9. Cover, T. (January 1973). "Enumerative source encoding". IEEE Transactions on Information Theory. 19 (1): 73–77. doi:10.1109/TIT.1973.1054929. ISSN   0018-9448.
  10. 1 2 Cover, Thomas M.; Thomas, Joy A. (2006-07-18). Elements of Information Theory 2nd Edition (2nd ed.). Hoboken, N.J: Wiley-Interscience. ISBN   978-0-471-24195-9.
  11. Yongge Wang: Randomness and Complexity. PhD Thesis, 1996, http://webpages.uncc.edu/yonwang/papers/thesis.pdf

Further reading