Locally testable code

Last updated

A locally testable code is a type of error-correcting code for which it can be determined if a string is a word in that code by looking at a small (frequently constant) number of bits of the string. In some situations, it is useful to know if the data is corrupted without decoding all of it so that appropriate action can be taken in response. For example, in communication, if the receiver encounters a corrupted code, it can request the data be re-sent, which could increase the accuracy of said data. Similarly, in data storage, these codes can allow for damaged data to be recovered and rewritten properly.

Contents

In contrast, locally decodable codes use a small number of bits of the codeword to probabilistically recover the original information. The fraction of errors determines how likely it is that the decoder correctly recovers the original bit; however, not all locally decodable codes are locally testable. [1]

Clearly, any valid codeword should be accepted as a codeword, but strings that are not codewords could be only one bit off, which would require many (certainly more than a constant number) probes. To account for this, testing failure is only defined if the string is off by at least a set fraction of its bits. This implies words of the code must be longer than the input strings by adding some redundancy.

Definition

To measure the distance between two strings, the Hamming distance is used

The distance of a string from a code is computed by

Relative distances are computed as a fraction of the number of bits

A code is called -local -testable if there exists a Turing machine M given random access to an input that makes at most non-adaptive queries of and satisfies the following: [2]

  • For any and , . In other words, M accepts given access to any codeword of C.
  • For such that , . M must reject strings -far from C at least half the time.

Also the rate of a code is the ratio between its message length and codeword length

Limits

It remains an open question whether there are any locally testable codes of linear size, but there are several constructions that are considered "nearly linear": [3]

  1. Polynomial arbitrarily close to linear; for any , .
  2. Functions of the form , where is a function tending toward 0. This makes n closer to linear as k increases. For example:
    • for some
    • for

These have both been achieved, even with constant query complexity and a binary alphabet, such as with for any . The next nearly linear goal is linear up to a polylogarithmic factor; . Nobody has yet to come up with a linearly testable code that satisfies this constraint. [3]

In November 2021 two preprints have reported [4] [5] [6] [7] the first polynomial-time construction of "-LTCs" namely locally testable codes with constant rate , constant distance and constant locality .

Connection with probabilistically checkable proofs

Locally testable codes have a lot in common with probabilistically checkable proofs (PCPs). This should be apparent from the similarities of their construction. In both, we are given random nonadaptive queries into a large string and if we want to accept, we must with probability 1, and if not, we must accept no more than half the time. The major difference is that PCPs are interested in accepting if there exists a so that . Locally testable codes, on the other hand, accept if it is part of the code. Many things can go wrong in assuming a PCP proof encodes a locally testable code. For example, the PCP definition says nothing about invalid proofs, only invalid inputs.

Despite this difference, locally testable codes and PCPs are similar enough that frequently to construct one, a prover will construct the other along the way. [8]

Examples

Hadamard code

One of the most famous error-correcting codes, the Hadamard code, is a locally testable code. A codeword x is encoded in the Hadamard code to be the linear function (mod 2). This requires listing out the result of this function for every possible y, which requires exponentially more bits than its input. To test if a string w is a codeword of the Hadamard code, all we have to do is test if the function it encodes is linear. This means simply checking if for x and y uniformly random vectors (where denotes bitwise XOR).

It is easy to see that for any valid encoding , this equation is true, as that is the definition of a linear function. Somewhat harder, however, is showing that a string that is -far from C will have an upper bound on its error in terms of . One bound is found by the direct approach of approximating the chances of exactly one of the three probes yielding an incorrect result. Let A, B, and C be the events of , , and being incorrect. Let E be the event of exactly one of these occurring. This comes out to

This works for , but shortly after, . With additional work, it can be shown that the error is bounded by

For any given , this only has a constant chance of false positives, so we can simply check a constant number of times to get the probability below 1/2. [3]

Long code

The Long code is another code with very large blowup which is close to locally testable. Given an input (note, this takes bits to represent), the function that returns the bit of the input, , is evaluated on all possible -bit inputs , and the codeword is the concatenation of these (of length ). The way to locally test this with some errors is to pick a uniformly random input and set , but with a small chance of flipping each bit, . Accept a function as a codeword if . If is a codeword, this will accept as long as was unchanged, which happens with probability . This violates the requirement that codewords are always accepted, but may be good enough for some needs. [9]

Other locally testable codes include Reed-Muller codes (see locally decodable codes for a decoding algorithm), Reed-Solomon codes, and the short code.

See also

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, the most prominent of which include consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

A binary symmetric channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit, and the receiver will receive a bit. The bit will be "flipped" with a "crossover probability" of p, and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or disk drive storage.

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

<span class="mw-page-title-main">Quantization (signal processing)</span> Process of mapping a continuous set to a countable set

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

<span class="mw-page-title-main">Hadamard code</span> Error-correcting code

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

In theoretical computer science, a small-bias sample space is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weak entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent from the source and uniformly distributed. Examples of weakly random sources include radioactive decay or thermal noise; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (TRNG); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

The Gilbert–Varshamov bound for linear codes is related to the general Gilbert–Varshamov bound, which gives a lower bound on the maximal number of elements in an error-correcting code of a given block length and minimum Hamming weight over a field . This may be translated into a statement about the maximum rate of a code with given length and minimum distance. The Gilbert–Varshamov bound for linear codes asserts the existence of q-ary linear codes for any relative minimum distance less than the given bound that simultaneously have high rate. The existence proof uses the probabilistic method, and thus is not constructive. The Gilbert–Varshamov bound is the best known in terms of relative distance for codes over alphabets of size less than 49. For larger alphabets, algebraic geometry codes sometimes achieve an asymptotically better rate vs. distance tradeoff than is given by the Gilbert–Varshamov bound.

In coding theory, generalized minimum-distance (GMD) decoding provides an efficient algorithm for decoding concatenated codes, which is based on using an errors-and-erasures decoder for the outer code.

In coding theory, the Zyablov bound is a lower bound on the rate and relative distance that are achievable by concatenated codes.

In coding theory, folded Reed–Solomon codes are like Reed–Solomon codes, which are obtained by mapping Reed–Solomon codewords over a larger alphabet by careful bundling of codeword symbols.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes in the presence of many errors. If a code has relative distance , then it is possible in principle to recover an encoded message when up to fraction of the codeword symbols are corrupted. But when error rate is greater than , this will not in general be possible. List decoding overcomes that issue by allowing the decoder to output a short list of messages that might have been encoded. List decoding can correct more than fraction of errors.

In coding theory, burst error-correcting codes employ methods of correcting burst errors, which are errors that occur in many consecutive bits rather than occurring in bits independently of each other.

In coding theory, Zemor's algorithm, designed and developed by Gilles Zemor, is a recursive low-complexity approach to code construction. It is an improvement over the algorithm of Sipser and Spielman.

References

  1. Kaufman, Tali; Viderman, Michael. "Locally Testable vs. Locally Decodable Codes".
  2. Ben-Sasson, Eli; Sudan, Madhu. "Robust Locally Testable Codes and Products of Codes" (PDF).
  3. 1 2 3 Goldreich, Oded (2005). "Short Locally Testable Codes and Proofs (Survey)". CiteSeerX   10.1.1.110.2530 .
  4. Panteleev, Pavel; Kalachev, Gleb (2021-11-05). "Asymptotically Good Quantum and Locally Testable Classical LDPC Codes". arXiv: 2111.03654 [cs.IT].
  5. Dinur, Irit; Evra, Shai; Livne, Ron; Lubotzky, Alexander; Mozes, Shahar (2021-11-08). "Locally Testable Codes with constant rate, distance, and locality". arXiv: 2111.04808 [cs.IT].
  6. Rorvig, Mordechai (2021-11-24). "Researchers Defeat Randomness to Create Ideal Code". Quanta Magazine . Retrieved 2021-11-24.
  7. Rorvig, Mordechai (2022-01-06). "Qubits Can Be as Safe as Bits, Researchers Show". Quanta Magazine. Retrieved 2022-02-02.
  8. Cheraghchi, Mahdi. "Locally Testable Codes".
  9. Kol, Gillat; Raz, Ran. "Bounds on Locally Testable Codes with Unique Tests" (PDF).