In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 [1] [2] and is a generalization of classical information theory.
The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to Cantor's diagonal argument, Gödel's incompleteness theorem, and Turing's halting problem. In particular, no program P computing a lower bound for each text's Kolmogorov complexity can return a value essentially larger than P's own length (see section § Chaitin's incompleteness theorem); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts. Kolmogorov complexity is the length of the ultimately compressed version of a file (i.e., anything which can be put in a computer). Formally, it is the length of a shortest program from which the file can be reconstructed. While Kolmogorov complexity is uncomputable, various approaches have been proposed and reviewed. [3]
Consider the following two strings of 32 lowercase letters and digits:
abababababababababababababababab
, and4c1j5b2p0cv4w1x8rx2y39umgw5q85s7
The first string has a short English-language description, namely "write ab 16 times", which consists of 17 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, i.e., "write 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" which has 38 characters. Hence the operation of writing the first string can be said to have "less complexity" than writing the second.
More formally, the complexity of a string is the length of the shortest possible description of the string in some fixed universal description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings like the abab example above, whose Kolmogorov complexity is small relative to the string's size, are not considered to be complex.
The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings. Such a description language can be based on any computer programming language, such as Lisp, Pascal, or Java. If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g., 7 for ASCII).
We could, alternatively, choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.
Any string s has at least one description. For example, the second string above is output by the pseudo-code:
function GenerateString2() return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"
whereas the first string is output by the (much shorter) pseudo-code:
function GenerateString1() return "ab" × 16
If a description d(s) of a string s is of minimal length (i.e., using the fewest bits), it is called a minimal description of s, and the length of d(s) (i.e. the number of bits in the minimal description) is the Kolmogorov complexity of s, written K(s). Symbolically,
The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the invariance theorem).
There are two definitions of Kolmogorov complexity: plain and prefix-free. The plain complexity is the minimal description length of any program, and denoted while the prefix-free complexity is the minimal description length of any program encoded in a prefix-free code, and denoted . The plain complexity is more intuitive, but the prefix-free complexity is easier to study.
By default, all equations hold only up to an additive constant. For example, really means that , that is, .
Let be a computable function mapping finite binary strings to binary strings. It is a universal function if, and only if, for any computable , we can encode the function in a "program" , such that . We can think of as a program interpreter, which takes in an initial segment describing the program, followed by data that the program should process.
One problem with plain complexity is that , because intuitively speaking, there is no general way to tell where to divide an output string just by looking at the concatenated string. We can divide it by specifying the length of or , but that would take extra symbols. Indeed, for any there exists such that . [4]
Typically, inequalities with plain complexity have a term like on one side, whereas the same inequalities with prefix-free complexity have only .
The main problem with plain complexity is that there is something extra sneaked into a program. A program not only represents for something with its code, but also represents its own length. In particular, a program may represent a binary number up to , simply by its own length. Stated in another way, it is as if we are using a termination symbol to denote where a word ends, and so we are not using 2 symbols, but 3. To fix this defect, we introduce the prefix-free Kolmogorov complexity. [5]
A prefix-free code is a subset of such that given any two different words in the set, neither is a prefix of the other. The benefit of a prefix-free code is that we can build a machine that reads words from the code forward in one direction, and as soon as it reads the last symbol of the word, it knows that the word is finished, and does not need to backtrack or a termination symbol.
Define a prefix-free Turing machine to be a Turing machine that comes with a prefix-free code, such that the Turing machine can read any string from the code in one direction, and stop reading as soon as it reads the last symbol. Afterwards, it may compute on a work tape and write to a write tape, but it cannot move its read-head anymore.
This gives us the following formal way to describe K. [6]
Note that some universal Turing machines may not be programmable with prefix codes. We must pick only a prefix-free universal Turing machine.
The prefix-free complexity of a string is the shortest prefix code that makes the machine output :
There are some description languages which are optimal, in the following sense: given any description of an object in a description language, said description may be used in the optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, nor the object being described.
Here is an example of an optimal description language. A description will have two parts:
In more technical terms, the first part of a description is a computer program (specifically: a compiler for the object's language, written in the description language), with the second part being the input to that computer program which produces the object as output.
The invariance theorem follows: Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.
Proof: Any description D in L can be converted into a description in the optimal language by first describing L as a computer program P (part 1), and then using the original description D as input to that program (part 2). The total length of this new description D′ is (approximately):
The length of P is a constant that doesn't depend on D. So, there is at most a constant overhead, regardless of the object described. Therefore, the optimal language is universal up to this additive constant.
Theorem: If K1 and K2 are the complexity functions relative to Turing complete description languages L1 and L2, then there is a constant c – which depends only on the languages L1 and L2 chosen – such that
Proof: By symmetry, it suffices to prove that there is some constant c such that for all strings s
Now, suppose there is a program in the language L1 which acts as an interpreter for L2:
function InterpretLanguage(stringp)
where p is a program in L2. The interpreter is characterized by the following property:
InterpretLanguage
on input p returns the result of running p.Thus, if P is a program in L2 which is a minimal description of s, then InterpretLanguage
(P) returns the string s. The length of this description of s is the sum of
InterpretLanguage
, which we can take to be the constant c.This proves the desired upper bound.
Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structures).
The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by Ray Solomonoff, who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference" [7] as part of his invention of algorithmic probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in Information and Control. [8] [9]
Andrey Kolmogorov later independently published this theorem in Problems Inform. Transmission [10] in 1965. Gregory Chaitin also presents this theorem in J. ACM – Chaitin's paper was submitted October 1966 and revised in December 1968, and cites both Solomonoff's and Kolmogorov's papers. [11]
The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm and the code lengths it allows to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information.
When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority. [12] For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence, while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal prior probability distribution. The broader area encompassing descriptional complexity and probability is often called Kolmogorov complexity. The computer scientist Ming Li considers this an example of the Matthew effect: "...to everyone who has, more will be given..." [13]
There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs, and is mainly due to Leonid Levin (1974).
An axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov. [14]
We write to be , where means some fixed way to code for a tuple of strings x and y.
We omit additive factors of . This section is based on. [6]
Theorem.
Proof. Take any program for the universal Turing machine used to define plain complexity, and convert it to a prefix-free program by first coding the length of the program in binary, then convert the length to prefix-free coding. For example, suppose the program has length 9, then we can convert it as follows:where we double each digit, then add a termination code. The prefix-free universal Turing machine can then read in any program for the other machine as follows:The first part programs the machine to simulate the other machine, and is a constant overhead . The second part has length . The third part has length .
Theorem: There exists such that . More succinctly, . Similarly, , and .[ clarification needed ]
Proof. For the plain complexity, just write a program that simply copies the input to the output. For the prefix-free complexity, we need to first describe the length of the string, before writing out the string itself.
Theorem. (extra information bounds, subadditivity)
Note that there is no way to compare and or or or . There are strings such that the whole string is easy to describe, but its substrings are very hard to describe.
Theorem. (symmetry of information).
Proof. One side is simple. For the other side with , we need to use a counting argument (page 38 [15] ).
Theorem. (information non-increase) For any computable function , we have .
Proof. Program the Turing machine to read two subsequent programs, one describing the function and one describing the string. Then run both programs on the work tape to produce , and write it out.
At first glance it might seem trivial to write a program which can compute K(s) for any s, such as the following:
function KolmogorovComplexity(string s) for i = 1 to infinity: for each string p of length exactly i if isValidProgram(p) and evaluate(p) == s return i
This program iterates through all possible programs (by iterating through all possible strings and only considering those which are valid programs), starting with the shortest. Each program is executed to find the result produced by that program, comparing it to the input s. If the result matches then the length of the program is returned.
However this will not work because some of the programs p tested will not terminate, e.g. if they contain infinite loops. There is no way to avoid all of these programs by testing them in some way before executing them due to the non-computability of the halting problem.
What is more, no program at all can compute the function K, be it ever so sophisticated. This is proven in the following.
Theorem: There exist strings of arbitrarily large Kolmogorov complexity. Formally: for each natural number n, there is a string s with K(s) ≥ n. [note 1]
Proof: Otherwise all of the infinitely many possible finite strings could be generated by the finitely many [note 2] programs with a complexity below n bits.
Theorem: K is not a computable function. In other words, there is no program which takes any string s as input and produces the integer K(s) as output.
The following proof by contradiction uses a simple Pascal-like language to denote programs; for sake of proof simplicity assume its description (i.e. an interpreter) to have a length of 1400000 bits. Assume for contradiction there is a program
function KolmogorovComplexity(string s)
which takes as input a string s and returns K(s). All programs are of finite length so, for sake of proof simplicity, assume it to be 7000000000 bits. Now, consider the following program of length 1288 bits:
function GenerateComplexString() for i = 1 to infinity: for each string s of length exactly i if KolmogorovComplexity(s) ≥ 8000000000 return s
Using KolmogorovComplexity
as a subroutine, the program tries every string, starting with the shortest, until it returns a string with Kolmogorov complexity at least 8000000000 bits, [note 3] i.e. a string that cannot be produced by any program shorter than 8000000000 bits. However, the overall length of the above program that produced s is only 7001401288 bits, [note 4] which is a contradiction. (If the code of KolmogorovComplexity
is shorter, the contradiction remains. If it is longer, the constant used in GenerateComplexString
can always be changed appropriately.) [note 5]
The above proof uses a contradiction similar to that of the Berry paradox: "1The 2smallest 3positive 4integer 5that 6cannot 7be 8defined 9in 10fewer 11than 12twenty 13English 14words". It is also possible to show the non-computability of K by reduction from the non-computability of the halting problem H, since K and H are Turing-equivalent. [16]
There is a corollary, humorously called the "full employment theorem" in the programming language community, stating that there is no perfect size-optimizing compiler.
The chain rule [17] for Kolmogorov complexity states that there exists a constant c such that for all X and Y:
It states that the shortest program that reproduces X and Y is no more than a logarithmic term larger than a program to reproduce X and a program to reproduce Y given X. Using this statement, one can define an analogue of mutual information for Kolmogorov complexity.
It is straightforward to compute upper bounds for K(s) – simply compress the string s with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the length of the resulting string – concretely, the size of a self-extracting archive in the given language.
A string s is compressible by a number c if it has a description whose length does not exceed |s| − c bits. This is equivalent to saying that K(s) ≤ |s| − c. Otherwise, s is incompressible by c. A string incompressible by 1 is said to be simply incompressible – by the pigeonhole principle, which applies because every compressed string maps to only one uncompressed string, incompressible strings must exist, since there are 2n bit strings of length n, but only 2n − 1 shorter strings, that is, strings of length less than n, (i.e. with length 0, 1, ..., n − 1). [note 6]
For the same reason, most strings are complex in the sense that they cannot be significantly compressed – their K(s) is not much smaller than |s|, the length of s in bits. To make this precise, fix a value of n. There are 2n bitstrings of length n. The uniform probability distribution on the space of these bitstrings assigns exactly equal weight 2−n to each string of length n.
Theorem: With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least 1 − 2−c+1 + 2−n.
To prove the theorem, note that the number of descriptions of length not exceeding n − c is given by the geometric series:
There remain at least
bitstrings of length n that are incompressible by c. To determine the probability, divide by 2n.
By the above theorem (§ Compression), most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proven, if the complexity of the string is above a certain threshold. The precise formalization is as follows. First, fix a particular axiomatic system S for the natural numbers. The axiomatic system has to be powerful enough so that, to certain assertions A about complexity of strings, one can associate a formula FA in S. This association must have the following property:
If FA is provable from the axioms of S, then the corresponding assertion A must be true. This "formalization" can be achieved based on a Gödel numbering.
Theorem: There exists a constant L (which only depends on S and on the choice of description language) such that there does not exist a string s for which the statement
can be proven within S. [18] [19]
Proof Idea: The proof of this result is modeled on a self-referential construction used in Berry's paradox. We firstly obtain a program which enumerates the proofs within S and we specify a procedure P which takes as an input an integer L and prints the strings x which are within proofs within S of the statement K(x) ≥ L. By then setting L to greater than the length of this procedure P, we have that the required length of a program to print x as stated in K(x) ≥ L as being at least L is then less than the amount L since the string x was printed by the procedure P. This is a contradiction. So it is not possible for the proof system S to prove K(x) ≥ L for L arbitrarily large, in particular, for L larger than the length of the procedure P, (which is finite).
Proof:
We can find an effective enumeration of all the formal proofs in S by some procedure
function NthProof(intn)
which takes as input n and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here, since every possible proof in the language of S is produced for some n. Some of these are complexity formulas of the form K(s) ≥ n where s and n are constants in the language of S. There is a procedure
function NthProofProvesComplexityFormula(intn)
which determines whether the nth proof actually proves a complexity formula K(s) ≥ L. The strings s, and the integer L in turn, are computable by procedure:
function StringNthProof(intn)
function ComplexityLowerBoundNthProof(intn)
Consider the following procedure:
function GenerateProvablyComplexString(intn) for i = 1 to infinity: if NthProofProvesComplexityFormula(i) and ComplexityLowerBoundNthProof(i) ≥ nreturn StringNthProof(i)
Given an n, this procedure tries every proof until it finds a string and a proof in the formal system S of the formula K(s) ≥ L for some L ≥ n; if no such proof exists, it loops forever.
Finally, consider the program consisting of all these procedure definitions, and a main call:
GenerateProvablyComplexString(n0)
where the constant n0 will be determined later on. The overall program length can be expressed as U+log2(n0), where U is some constant and log2(n0) represents the length of the integer value n0, under the reasonable assumption that it is encoded in binary digits. We will choose n0 to be greater than the program length, that is, such that n0 > U+log2(n0). This is clearly true for n0 sufficiently large, because the left hand side grows linearly in n0 whilst the right hand side grows logarithmically in n0 up to the fixed constant U.
Then no proof of the form "K(s)≥L" with L≥n0 can be obtained in S, as can be seen by an indirect argument: If ComplexityLowerBoundNthProof(i)
could return a value ≥n0, then the loop inside GenerateProvablyComplexString
would eventually terminate, and that procedure would return a string s such that
K(s) | |||
≥ | n0 | by construction of GenerateProvablyComplexString | |
> | U+log2(n0) | by the choice of n0 | |
≥ | K(s) | since s was described by the program with that length |
This is a contradiction, Q.E.D.
As a consequence, the above program, with the chosen value of n0, must loop forever.
Similar ideas are used to prove the properties of Chaitin's constant.
The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is Bayesian (i.e. it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (i.e. the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (i.e. even for very hard problems, MML will converge to any underlying model) and efficiency (i.e. the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe (1999) showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity). [20]
Kolmogorov randomness defines a string (usually of bits) as being random if and only if every computer program that can produce that string is at least as long as the string itself. To make this precise, a universal computer (or universal Turing machine) must be specified, so that "program" means a program for this universal machine. A random string in this sense is "incompressible" in that it is impossible to "compress" the string into a program that is shorter than the string itself. For every universal computer, there is at least one algorithmically random string of each length. [21] Whether a particular string is random, however, depends on the specific universal computer that is chosen. This is because a universal computer can have a particular string hard-coded in itself, and a program running on this universal computer can then simply refer to this hard-coded string using a short sequence of bits (i.e. much shorter than the string itself).
This definition can be extended to define a notion of randomness for infinite sequences from a finite alphabet. These algorithmically random sequences can be defined in three equivalent ways. One way uses an effective analogue of measure theory; another uses effective martingales. The third way defines an infinite sequence to be random if the prefix-free Kolmogorov complexity of its initial segments grows quickly enough — there must be a constant c such that the complexity of an initial segment of length n is always at least n−c. This definition, unlike the definition of randomness for a finite string, is not affected by which universal machine is used to define prefix-free Kolmogorov complexity. [22]
For dynamical systems, entropy rate and algorithmic complexity of the trajectories are related by a theorem of Brudno, that the equality holds for almost all . [23]
It can be shown [24] that for the output of Markov information sources, Kolmogorov complexity is related to the entropy of the information source. More precisely, the Kolmogorov complexity of the output of a Markov information source, normalized by the length of the output, converges almost surely (as the length of the output goes to infinity) to the entropy of the source.
Theorem. (Theorem 14.2.5 [25] ) The conditional Kolmogorov complexity of a binary string satisfieswhere is the binary entropy function (not to be confused with the entropy rate).
The Kolmogorov complexity function is equivalent to deciding the halting problem.
If we have a halting oracle, then the Kolmogorov complexity of a string can be computed by simply trying every halting program, in lexicographic order, until one of them outputs the string.
The other direction is much more involved. [26] [27] It shows that given a Kolmogorov complexity function, we can construct a function , such that for all large , where is the Busy Beaver shift function (also denoted as ). By modifying the function at lower values of we get an upper bound on , which solves the halting problem.
Consider this program , which takes input as , and uses .
We prove by contradiction that for all large .
Let be a Busy Beaver of length . Consider this (prefix-free) program, which takes no input:
Let the string output by the program be .
The program has length , where comes from the length of the Busy Beaver , comes from using the (prefix-free) Elias delta code for the number , and comes from the rest of the program. Therefore,for all big . Further, since there are only so many possible programs with length , we have by pigeonhole principle. By assumption, , so every string of length has a minimal program with runtime . Thus, the string has a minimal program with runtime . Further, that program has length . This contradicts how was constructed.
Fix a universal Turing machine , the same one used to define the (prefix-free) Kolmogorov complexity. Define the (prefix-free) universal probability of a string to beIn other words, it is the probability that, given a uniformly random binary stream as input, the universal Turing machine would halt after reading a certain prefix of the stream, and output .
Note. does not mean that the input stream is , but that the universal Turing machine would halt at some point after reading the initial segment , without reading any further input, and that, when it halts, its has written to the output tape.
Theorem. (Theorem 14.11.1 [25] )
This section needs expansion. You can help by adding to it. (July 2014) |
The conditional Kolmogorov complexity of two strings is, roughly speaking, defined as the Kolmogorov complexity of x given y as an auxiliary input to the procedure. [28] [29]
There is also a length-conditional complexity , which is the complexity of x given the length of x as known/input. [30] [31]
Time-bounded Kolmogorov complexity is a modified version of Kolmogorov complexity where the space of programs to be searched for a solution is confined to only programs that can run within some pre-defined number of steps. [32] It is hypothesised that the possibility of the existence of an efficient algorithm for determining approximate time-bounded Kolmogorov complexity is related to the question of whether true one-way functions exist. [33] [34]
for
loop will eventually terminate.KolmogorovComplexity
KolmogorovComplexity
has length n bits, the constant m used in GenerateComplexString
needs to be adapted to satisfy n + 1400000 + 1218 + 7·log10(m) < m, which is always possible since m grows faster than log10(m).In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.
In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine, or alternatively the set of problems that can be solved in polynomial time by a nondeterministic Turing machine.
In theoretical computer science, the busy beaver game aims at finding a terminating program of a given size that either produces the most output possible, or runs for the longest number of steps. Since an endlessly looping program producing infinite output or running for infinite time is easily conceived, such programs are excluded from the game. Rather than traditional programming languages, the programs used in the game are n-state Turing machines, one of the first mathematical models of computation.
Ray Solomonoff was an American mathematician who invented algorithmic probability, his General Theory of Inductive Inference, and was a founder of algorithmic information theory. He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.
In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs.
Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory.
In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory.
In complexity theory, PP, or PPT is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of less than 1/2 for all instances. The abbreviation PP refers to probabilistic polynomial time. The complexity class was defined by Gill in 1977.
In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code or a uniquely decodable code for a given set of codeword lengths. Its applications to prefix codes and trees often find use in computer science and information theory. The prefix code can contain either finitely many or infinitely many codewords.
In computational complexity theory, P/poly is a complexity class representing problems that can be solved by small circuits. More precisely, it is the set of formal languages that have polynomial-size circuit families. It can also be defined equivalently in terms of Turing machines with advice, extra information supplied to the Turing machine along with its input, that may depend on the input length but not on the input itself. In this formulation, P/poly is the class of decision problems that can be solved by a polynomial-time Turing machine with advice strings of length polynomial in the input size. These two different definitions make P/poly central to circuit complexity and non-uniform complexity.
Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects, such as strings or any other data structure. In other words, it is shown within algorithmic information theory that computational incompressibility "mimics" the relations or inequalities found in information theory. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."
Intuitively, an algorithmically random sequence is a sequence of binary digits that appears random to any algorithm running on a universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet. Random sequences are key objects of study in algorithmic information theory.
In theoretical computer science, circuit complexity is a branch of computational complexity theory in which Boolean functions are classified according to the size or depth of the Boolean circuits that compute them. A related notion is the circuit complexity of a recursive language that is decided by a uniform family of circuits .
The chain rule for Kolmogorov complexity is an analogue of the chain rule for information entropy, which states:
In 1973, Andrey Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity. The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.
In mathematics, a set of natural numbers is called a K-trivial set if its initial segments viewed as binary strings are easy to describe: the prefix-free Kolmogorov complexity is as low as possible, close to that of a computable set. Solovay proved in 1975 that a set can be K-trivial without being computable.
Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world.
The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.
Universality probability is an abstruse probability measure in computational complexity theory that concerns universal Turing machines.
In mathematics, the incompressibility method is a proof method like the probabilistic method, the counting method or the pigeonhole principle. To prove that an object in a certain class satisfies a certain property, select an object of that class that is incompressible. If it does not satisfy the property, it can be compressed by computable coding. Since it can be generally proven that almost all objects in a given class are incompressible, the argument demonstrates that almost all objects in the class have the property involved. To select an incompressible object is ineffective, and cannot be done by a computer program. However, a simple counting argument usually shows that almost all objects of a given class can be compressed by only a few bits.
{{cite book}}
: |journal=
ignored (help)