Incompressibility method

Last updated

In mathematics, the incompressibility method is a proof method like the probabilistic method, the counting method or the pigeonhole principle. To prove that an object in a certain class (on average) satisfies a certain property, select an object of that class that is incompressible. If it does not satisfy the property, it can be compressed by computable coding. Since it can be generally proven that almost all objects in a given class are incompressible, the argument demonstrates that almost all objects in the class have the property involved (not just the average). To select an incompressible object is ineffective, and cannot be done by a computer program. However, a simple counting argument usually shows that almost all objects of a given class can be compressed by only a few bits (are incompressible).

Contents

History

The incompressibility method depends on an objective, fixed notion of incompressibility. Such a notion was provided by the Kolmogorov complexity theory, named for Andrey Kolmogorov. [1]

One of the first uses of the incompressibility method with Kolmogorov complexity in the theory of computation was to prove that the running time of a one-tape Turing machine is quadratic for accepting a palindromic language and sorting algorithms require at least time to sort items. [2] One of the early influential papers using the incompressibility method was published in 1980. [3] The method was applied to a number of fields, and its name was coined in a textbook. [4]

Applications

Number theory

According to an elegant Euclidean proof, there is an infinite number of prime numbers. Bernhard Riemann demonstrated that the number of primes less than a given number is connected with the 0s of the Riemann zeta function. Jacques Hadamard and Charles Jean de la Vallée-Poussin proved in 1896 that this number of primes is asymptotic to ; see Prime number theorem (use for the natural logarithm an for the binary logarithm). Using the incompressibility method, G. J. Chaitin argued as follows: Each can be described by its prime factorization (which is unique), where are the first primes which are (at most) and the exponents (possibly) 0. Each exponent is (at most) , and can be described by bits. The description of can be given in bits, provided we know the value of (enabling one to parse the consecutive blocks of exponents). To describe requires only bits. Using the incompressibility of most positive integers, for each there is a positive integer of binary length which cannot be described in fewer than bits. This shows that the number of primes, less than , satisfies

A more-sophisticated approach attributed to Piotr Berman (present proof partially by John Tromp) describes every incompressible by and , where is the largest prime number dividing . Since is incompressible, the length of this description must exceed . To parse the first block of the description must be given in prefix form , where is an arbitrary, small, positive function. Therefore, . Hence, with for a special sequence of values . This shows that the expression below holds for this special sequence, and a simple extension shows that it holds for every :

Both proofs are presented in more detail. [4]

Graph theory

A labeled graph with nodes can be represented by a string of bits, where each bit indicates the presence (or absence) of an edge between the pair of nodes in that position. , and the degree of each vertex satisfies

To prove this by the incompressibility method, if the deviation is larger we can compress the description of below ; this provides the required contradiction. This theorem is required in a more complicated proof, where the incompressibility argument is used a number of times to show that the number of unlabeled graphs is

[5]

Combinatorics

A transitive tournament is a complete directed graph, ; if , . Consider the set of all transitive tournaments on nodes. Since a tournament is a labeled, directed complete graph, it can be encoded by a string of bits where each bit indicates the direction of the edge between the pair of nodes in that position. Using this encoding, every transitive tournament contains a transitive subtournament on (at least) vertices with

This was shown as the first problem. [6] It is easily solved by the incompressibility method, [7] as are the coin-weighing problem, the number of covering families and expected properties; for example, at least a fraction of of all transitive tournaments on vertices have transitive subtournaments on not more than vertices. is large enough.

If a number of events are independent (in probability theory) of one another, the probability that none of the events occur can be easily calculated. If the events are dependent, the problem becomes difficult. Lovász local lemma [8] is a principle that if events are mostly independent of one another and have an individually-small probability, there is a positive probability that none of them will occur. [9] It was proven by the incompressibility method. [10] Using the incompressibility method, several versions of expanders and superconcentrator graphs were shown to exist. [11]

Topological combinatorics

In the Heilbronn triangle problem, throw points in the unit square and determine the maximum of the minimal area of a triangle formed by three of the points over all possible arrangements. This problem was solved for small arrangements, and much work was done on asymptotic expression as a function of . The original conjecture of Heilbronn was during the early 1950s. Paul Erdős proved that this bound is correct for , a prime number. The general problem remains unsolved, apart from the best-known lower bound (achievable; hence, Heilbronn's conjecture is not correct for general ) and upper bound (proven by Komlos, Pintsz and Szemeredi in 1982 and 1981, respectively). Using the incompressibility method, the average case was studied. It was proven that if the area is too small (or large) it can be compressed below the Kolmogorov complexity of a uniformly-random arrangement (high Kolmogorov complexity). This proves that for the overwhelming majority of the arrangements (and the expectation), the area of the smallest triangle formed by three of points thrown uniformly at random in the unit square is . In this case, the incompressibility method proves the lower and upper bounds of the property involved. [12]

Probability

The law of the iterated logarithm, the law of large numbers and the recurrence property were shown to hold using the incompressibility method [13] and Kolmogorov's zero–one law, [14] with normal numbers expressed as binary strings (in the sense of E. Borel) and the distribution of 0s and 1s in binary strings of high Kolmogorov complexity. [15]

Turing-machine time complexity

The basic Turing machine, as conceived by Alan Turing in 1936, consists of a memory: a tape of potentially-infinite cells on which a symbol can be written and a finite control, with a read-write head attached, which scans a cell on the tape. At each step, the read-write head can change the symbol in the cell being scanned and move one cell left, right, or not at all according to instruction from the finite control. Turing machines with two tape symbols may be considered for convenience, but this is not essential.

In 1968, F. C. Hennie showed that such a Turing machine requires order to recognize the language of binary palindromes in the worst case. In 1977, W. J. Paul [2] presented an incompressibility proof which showed that order time is required in the average case. For every integer , consider all words of that length. For convenience, consider words with the middle third of the word consisting of 0s. The accepting Turing machine ends with an accept state on the left (the beginning of the tape). A Turing-machine computation of a given word gives for each location (the boundary between adjacent cells) a sequence of crossings from left to right and right to left, each crossing in a particular state of the finite control. Positions in the middle third of a candidate word all have a crossing sequence of length (with a total computation time of ), or some position has a crossing sequence of . In the latter case, the word (if it is a palindrome) can be identified by that crossing sequence.

If other palindromes (ending in an accepting state on the left) have the same crossing sequence, the word (consisting of a prefix up to the position of the involved crossing sequence) of the original palindrome concatenated with a suffix the remaining length of the other palindrome would be accepted as well. Taking the palindrome of , the Kolmogorov complexity described by bits is a contradiction.

Since the overwhelming majority of binary palindromes have a high Kolmogorov complexity, this gives a lower bound on the average-case running time. The result is much more difficult, and shows that Turing machines with work tapes are more powerful than those with work tapes in real time (here one symbol per step). [3]

In 1984, W. Maass [16] and M. Li and P. M. B. Vitanyi [17] showed that the simulation of two work tapes by one work tape of a Turing machine takes time deterministically (optimally, solving a 30-year open problem) and time nondeterministically [17] (in, [16] this is . More results concerning tapes, stacks and queues, deterministically and nondeterministically, [17] were proven with the incompressibility method. [4]

Theory of computation

Heapsort is a sorting method, invented by J. W. J. Williams and refined by R. W. Floyd, which always runs in time. It is questionable whether Floyd's method is better than Williams' on average, although it is better in the worst case. Using the incompressibility method, it was shown [4] that Williams' method runs on average in time and Floyd's method runs on average in time. The proof was suggested by Ian Munro.

Shellsort, discovered by Donald Shell in 1959, is a comparison sort which divides a list to be sorted into sublists and sorts them separately. The sorted sublists are then merged, reconstituting a partially-sorted list. This process repeats a number of times (the number of passes). The difficulty of analyzing the complexity of the sorting process is that it depends on the number of keys to be sorted, on the number of passes and the increments governing the scattering in each pass; the sublist is the list of keys which are the increment parameter apart. Although this sorting method inspired a large number of papers, only the worst case was established. For the average running time, only the best case for a two-pass Shellsort [18] and an upper bound of [19] for a particular increment sequence for three-pass Shellsort were established. A general lower bound on an average -pass Shellsort was given [20] which was the first advance in this problem in four decades. In every pass, the comparison sort moves a key to another place a certain distance (a path length). All these path lengths are logarithmically coded for length in the correct order (of passes and keys). This allows the reconstruction of the unsorted list from the sorted list. If the unsorted list is incompressible (or nearly so), since the sorted list has near-zero Kolmogorov complexity (and the path lengths together give a certain code length) the sum must be at least as large as the Kolmogorov complexity of the original list. The sum of the path lengths corresponds to the running time, and the running time is lower-bounded in this argument by . This was improved to a lower bound of

where . [21] This implies for example the Jiang-Li-Vitanyi lower bound for all -pass increment sequences and improves that lower bound for particular increment sequences; the Janson-Knuth upper bound is matched by a lower bound for the used increment sequence, showing that three pass Shellsort for this increment sequence uses inversions.

Another example is as follows. are natural numbers and , it was shown that for every there is a Boolean matrix; every submatrix has a rank at least by the incompressibility method.

Logic

According to Gödel's first incompleteness theorem, in every formal system with computably enumerable theorems (or proofs) strong enough to contain Peano arithmetic, there are true (but unprovable) statements or theorems. This is proved by the incompressibility method; every formal system can be described finitely (for example, in bits). In such a formal system, we can express since it contains arithmetic. Given and a natural number , we can search exhaustively for a proof that some string of length satisfies . In this way, we obtain the first such string; : contradiction. [22]

Comparison with other methods

Although the probabilistic method generally shows the existence of an object with a certain property in a class, the incompressibility method tends to show that the overwhelming majority of objects in the class (the average, or the expectation) have that property. It is sometimes easy to turn a probabilistic proof into an incompressibility proof or vice versa. In some cases, it is difficult or impossible to turn a proof by incompressibility into a probabilistic (or counting proof). In virtually all the cases of Turing-machine time complexity cited above, the incompressibility method solved problems which had been open for decades; no other proofs are known. Sometimes a proof by incompressibility can be turned into a proof by counting, as happened in the case of the general lower bound on the running time of Shellsort. [20]

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.

In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time and memory storage requirements. The complexity of a problem is the complexity of the best algorithms that allow solving the problem.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

<span class="mw-page-title-main">Busy beaver</span> Longest-running turing machine of a given size

In theoretical computer science, the busy beaver game aims at finding a terminating program of a given size that either produces the most output possible, or runs for the longest number of steps. Since an endlessly looping program producing infinite output or running for infinite time is easily conceived, such programs are excluded from the game. Rather than traditional programming languages, the programs used in the game are n-state Turing machines, one of the first mathematical models of computation.

<span class="mw-page-title-main">Shellsort</span> Sorting algorithm which uses multiple comparison intervals

Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange or sorting by insertion. The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far-apart elements, it can move some out-of-place elements into the position faster than a simple nearest-neighbor exchange. Donald Shell published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.

In computational complexity theory, the time hierarchy theorems are important statements about time-bounded computation on Turing machines. Informally, these theorems say that given more time, a Turing machine can solve more problems. For example, there are problems that can be solved with n2 time but not n time, where n is the input length.

<span class="mw-page-title-main">Algorithmic probability</span>

In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

<span class="mw-page-title-main">Complexity class</span> Set of problems in computational complexity theory

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory.

In computational complexity theory, Savitch's theorem, proved by Walter Savitch in 1970, gives a relationship between deterministic and non-deterministic space complexity. It states that for any function ,

In computational complexity theory, DSPACE or SPACE is the computational resource describing the resource of memory space for a deterministic Turing machine. It represents the total amount of memory space that a "normal" physical computer would need to solve a given computational problem with a given algorithm.

In information theory, Shannon's source coding theorem establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects, such as strings or any other data structure. In other words, it is shown within algorithmic information theory that computational incompressibility "mimics" the relations or inequalities found in information theory. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

Intuitively, an algorithmically random sequence is a sequence of binary digits that appears random to any algorithm running on a universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet. Random sequences are key objects of study in algorithmic information theory.

A Turing machine is a hypothetical computing device, first conceived by Alan Turing in 1936. Turing machines manipulate symbols on a potentially infinite strip of tape according to a finite table of rules, and they provide the theoretical underpinnings for the notion of a computer algorithm.

<span class="mw-page-title-main">Circuit complexity</span> Model of computational complexity

In theoretical computer science, circuit complexity is a branch of computational complexity theory in which Boolean functions are classified according to the size or depth of the Boolean circuits that compute them. A related notion is the circuit complexity of a recursive language that is decided by a uniform family of circuits .

In 1973, Andrey Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity. The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.

Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other.

Information distance is the distance between two finite objects expressed as the number of bits in the shortest program which transforms one object into the other one or vice versa on a universal computer. This is an extension of Kolmogorov complexity. The Kolmogorov complexity of a single finite object is the information in that object; the information distance between a pair of finite objects is the minimum information required to go from one object to the other or vice versa. Information distance was first defined and investigated in based on thermodynamic principles, see also. Subsequently, it achieved final form in. It is applied in the normalized compression distance and the normalized Google distance.

References

  1. A. N. Kolmogorov, "Three approaches to the definition of the concept 'quantity of information', Probl. Peredachi Inf., 1:1 (1965), 3–11
  2. 1 2 W. J. Paul, "Kolmogorov's complexity and lower bounds", pp 325–333 in: L. Budach Ed., Proc. 2nd Int. Conf. Fund. Comput. Theory, 1979.
  3. 1 2 W. J. Paul, J. I. Seiferas, J. Simon, "An information-theoretic approach to time bounds for on-line computation" (preliminary version), Proc. 12th ACM Symp. Theory Comput (STOC), 357–367, 1980. doi : 10.1016/0022-0000(81)90009-X
  4. 1 2 3 4 M. Li, P. M. B. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, Springer, 1993, 1997, 2008, Chapter 6.
  5. H. M. Buhrman, M. Li, J. T. Tromp, P. M. B. Vitanyi, "Kolmogorov random graphs and the incompressibility method", SIAM J. Comput., 29:2(1999), 590–599. doi : 10.1137/S0097539797327805
  6. P. Erdos, J. Spencer, Probabilistic methods in combinatorics, Academic Press, 1974.
  7. M. Li, P. M. B. Vitanyi, "Kolmogorov complexity arguments in combinatorics", J. Combinatorial Theory, Series A, 66:2(1994), 226–236. doi : 10.1016/0097-3165(94)90064-7
  8. P. Erdős, L. Lovász, "Problems and results on 3-chromatic hypergraphs and some related questions", in A. Hajnal, R. Rado, and V. T. Sós, eds. Infinite and Finite Sets (to Paul Erdős on his 60th birthday). North-Holland. pp. 609–627.
  9. R. A. Moser, G. Tardos, "A constructive proof of the general lovász local lemma", Journal of the ACM (JACM), 2:57(2010), 11. doi : 10.1145/1667053.1667060
  10. L. Fortnow, "A Kolmogorov Complexity Proof of the Lovász Local Lemma", Computational Complexity Weblog, 2 June 2009.
  11. U. Schoning, "Construction of expanders and superconcentrators using Kolmogorov complexity", Random Structures & Algorithms, 17:1(2000), 64–77. doi : 10.1002/1098-2418(200008)17:1<64::AID-RSA5>3.0.CO;2-3
  12. T. Jiang, M. Li, P. M. B. Vitanyi, "The average‐case area of Heilbronn‐type triangles", Random Structures & Algorithms, 20:2(2002), 206–219. doi : 10.1002/rsa.10024
  13. V. G. Vovk, "The law of the iterated logarithm for random Kolmogorov, or chaotic, sequences", Theory Probab. Appl. 3:32(1988), 413–426. doi : 10.1137/1132061
  14. M. Zimand, "A high-low Kolmogorov complexity law equivalent to the 0–1 law", Inform. Process. Letters, 57:2(1996), 59–84. doi : 10.1016/0020-0190(95)00201-4
  15. M. Li, P. M. B. Vitanyi, "Statistical properties of finite sequences with high Kolmogorov complexity", Mathematical Systems Theory, 27(1994), 365–376. doi : 10.1007/BF01192146
  16. 1 2 W. Maass, "Combinatorial lower bound arguments for deterministic and nondeterministic Turing machines", Trans. Amer. Math. Soc. 292 (1985), 675–693. doi : 10.1090/S0002-9947-1985-0808746-4
  17. 1 2 3 M. Li, P. M. B. Vitanyi, "Tape versus queue and stacks: The lower bounds", Information and Computation, 78:1(1988), 56–85. doi : 10.1016/0890-5401(88)90003-X
  18. D. E. Knuth, Sorting and Searching (Vol. 3 The Art of Computer Programming), 2nd Ed. Addison-Wesley, 1998, pp 83–95. ISBN   0201896850
  19. S. Janson, D. E. Knuth, "Shellsort with three increments", Random Structures Algorithms 10:1–2(1997), 125–142. arXiv : cs/9608105
  20. 1 2 T. Jiang, M. Li, P. M. B. Vitanyi, "A lower bound on the average-case complexity of Shellsort", Journal of the ACM (JACM), 47:5(2000) 905–911. doi : 10.1145/355483.355488
  21. P.M.B. Vitanyi (2018), On the average‐case complexity of Shellsort, Random Structures and Algorithms, 52:2, 354–363 doi : 10.1002/rsa.20737
  22. G. J. Chaitin, Algorithmic Information Theory, Cambridge University Press, 1977.