Online codes

Last updated

In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message (with high probability). Rateless codes produce an arbitrarily large number of symbols which can be broadcast until the receivers have enough symbols.

Contents

High level view of the use of online codes Online codes highlevel.png
High level view of the use of online codes

The online encoding algorithm consists of several phases. First the message is split into n fixed size message blocks. Then the outer encoding is an erasure code which produces auxiliary blocks that are appended to the message blocks to form a composite message.

From this the inner encoding generates check blocks. Upon receiving a certain number of check blocks some fraction of the composite message can be recovered. Once enough has been recovered the outer decoding can be used to recover the original message.

Detailed discussion

Online codes are parameterised by the block size and two scalars, q and ε. The authors suggest q=3 and ε=0.01. These parameters set the balance between the complexity and performance of the encoding. A message of n blocks can be recovered, with high probability, from (1+3ε)n check blocks. The probability of failure is (ε/2)q+1.

Outer encoding

Any erasure code may be used as the outer encoding, but the author of online codes suggest the following.

For each message block, pseudo-randomly choose q auxiliary blocks (from a total of 0.55qεn auxiliary blocks) to attach it to. Each auxiliary block is then the XOR of all the message blocks which have been attached to it.

Inner encoding

A graph of check blocks received against number of message blocks fixed for a 10000 block message. Online codes time against blocks graph.png
A graph of check blocks received against number of message blocks fixed for a 10000 block message.

The inner encoding takes the composite message and generates a stream of check blocks. A check block is the XOR of all the blocks from the composite message that it is attached to.

The degree of a check block is the number of blocks that it is attached to. The degree is determined by sampling a random distribution, p, which is defined as:

for

Once the degree of the check block is known, the blocks from the composite message which it is attached to are chosen uniformly.

Decoding

Obviously the decoder of the inner stage must hold check blocks which it cannot currently decode [ disambiguation needed ]. A check block can only be decoded when all but one of the blocks which it is attached to are known. The graph to the left shows the progress of an inner decoder. The x-axis plots the number of check blocks received and the dashed line shows the number of check blocks which cannot currently be used. This climbs almost linearly at first as many check blocks with degree > 1 are received but unusable. At a certain point, some of the check blocks are suddenly usable, resolving more blocks which then causes more check blocks to be usable. Very quickly the whole file can be decoded.

As the graph also shows the inner decoder falls just shy of decoding everything for a little while after having received n check blocks. The outer encoding ensures that a few elusive blocks from the inner decoder are not an issue, as the file can be recovered without them.

Related Research Articles

Coaxial cable Electrical cable type with concentric inner conductor, insulator, and conducting shield

Coaxial cable, or coax is a type of electrical cable consisting of an inner conductor surrounded by a concentric conducting shield, with the two separated by a dielectric ; many coaxial cables also have a protective outer sheath or jacket. The term "coaxial" refers to the inner conductor and the outer shield sharing a geometric axis.

A binary symmetric channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit, and the receiver will receive a bit. The bit will be "flipped" with a "crossover probability" of p, and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or disk drive storage.

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC is constructed using a sparse Tanner graph. LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical maximum for a symmetric memoryless channel. The noise threshold defines an upper bound for the channel noise, up to which the probability of lost information can be made as small as desired. Using iterative belief propagation techniques, LDPC codes can be decoded in time linear to their block length.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures, which transforms a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate. The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency.

In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

In computer science, Luby transform codes are the first class of practical fountain codes that are near-optimal erasure correcting codes. They were invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, LT codes depend on sparse bipartite graphs to trade reception overhead for encoding and decoding speed. The distinguishing characteristic of LT codes is in employing a particularly simple algorithm based on the exclusive or operation to encode and decode the message.

In computer science, Raptor codes are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an extended abstract. Raptor codes are a significant theoretical and practical improvement over LT codes, which were the first practical class of fountain codes.

In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

In information theory, the error exponent of a channel code or source code over the block length of the code is the rate at which the error probability decays exponentially with the block length of the code. Formally, it is defined as the limiting ratio of the negative logarithm of the error probability to the block length of the code for large block lengths. For example, if the probability of error of a decoder drops as , where is the block length, the error exponent is . In this example, approaches for large . Many of the information-theoretic theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.

In coding theory a variable-length code is a code which maps source symbols to a variable number of bits.

Hadamard code

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

In coding theory, concatenated codes form a class of error-correcting codes that are derived by combining an inner code and an outer code. They were conceived in 1966 by Dave Forney as a solution to the problem of finding a code that has both exponentially decreasing error probability with increasing block length and polynomial-time decoding complexity. Concatenated codes became widely used in space communications in the 1970s.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

A locally testable code is a type of error-correcting code for which it can be determined if a string is a word in that code by looking at a small number of bits of the string. In some situations, it is useful to know if the data is corrupted without decoding all of it so that appropriate action can be taken in response. For example, in communication, if the receiver encounters a corrupted code, it can request the data be re-sent, which could increase the accuracy of said data. Similarly, in data storage, these codes can allow for damaged data to be recovered and rewritten properly.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

Sudoku codes are non-linear forward error correcting codes following rules of sudoku puzzles designed for an erasure channel. Based on this model, the transmitter sends a sequence of all symbols of a solved sudoku. The receiver either receives a symbol correctly or an erasure symbol to indicate that the symbol was not received. The decoder gets a matrix with missing entries and uses the constraints of sudoku puzzles to reconstruct a limited amount of erased symbols.