Lexicographic codes or lexicodes are greedily generated error-correcting codes with remarkably good properties. They were produced independently by Vladimir Levenshtein [1] and by John Horton Conway and Neil Sloane. [2] The binary lexicographic codes are linear codes, and include the Hamming codes and the binary Golay codes. [2]
A lexicode of length n and minimum distance d over a finite field is generated by starting with the all-zero vector and iteratively adding the next vector (in lexicographic order) of minimum Hamming distance d from the vectors added so far. As an example, the length-3 lexicode of minimum distance 2 would consist of the vectors marked by an "X" in the following example:
Vector | In code? |
---|---|
000 | X |
001 | |
010 | |
011 | X |
100 | |
101 | X |
110 | X |
111 |
Here is a table of all n-bit lexicode by d-bit minimal hamming distance, resulting of maximum 2m codewords dictionnary. For example, F4 code (n=4,d=2,m=3), extended Hamming code (n=8,d=4,m=4) and especially Golay code (n=24,d=8,m=12) shows exceptional compactness compared to neighbors.
n \ d | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | |||||||||||||||||
2 | 2 | 1 | ||||||||||||||||
3 | 3 | 2 | 1 | |||||||||||||||
4 | 4 | 3 | 1 | 1 | ||||||||||||||
5 | 5 | 4 | 2 | 1 | 1 | |||||||||||||
6 | 6 | 5 | 3 | 2 | 1 | 1 | ||||||||||||
7 | 7 | 6 | 4 | 3 | 1 | 1 | 1 | |||||||||||
8 | 8 | 7 | 4 | 4 | 2 | 1 | 1 | 1 | ||||||||||
9 | 9 | 8 | 5 | 4 | 2 | 2 | 1 | 1 | 1 | |||||||||
10 | 10 | 9 | 6 | 5 | 3 | 2 | 1 | 1 | 1 | 1 | ||||||||
11 | 11 | 10 | 7 | 6 | 4 | 3 | 2 | 1 | 1 | 1 | 1 | |||||||
12 | 12 | 11 | 8 | 7 | 4 | 4 | 2 | 2 | 1 | 1 | 1 | 1 | ||||||
13 | 13 | 12 | 9 | 8 | 5 | 4 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | |||||
14 | 14 | 13 | 10 | 9 | 6 | 5 | 4 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | ||||
15 | 15 | 14 | 11 | 10 | 7 | 6 | 5 | 4 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | |||
16 | 16 | 15 | 11 | 11 | 8 | 7 | 5 | 5 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | ||
17 | 17 | 16 | 12 | 11 | 9 | 8 | 6 | 5 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | |
18 | 18 | 17 | 13 | 12 | 9 | 9 | 7 | 6 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
19 | 19 | 18 | 14 | 13 | 10 | 9 | 8 | 7 | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
20 | 20 | 19 | 15 | 14 | 11 | 10 | 9 | 8 | 5 | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
21 | 21 | 20 | 16 | 15 | 12 | 11 | 10 | 9 | 5 | 5 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 |
22 | 22 | 21 | 17 | 16 | 12 | 12 | 11 | 10 | 6 | 5 | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 |
23 | 23 | 22 | 18 | 17 | 13 | 12 | 12 | 11 | 6 | 6 | 5 | 4 | 2 | 2 | 2 | 1 | 1 | 1 |
24 | 24 | 23 | 19 | 18 | 14 | 13 | 12 | 12 | 7 | 6 | 5 | 5 | 3 | 2 | 2 | 2 | 1 | 1 |
25 | 25 | 24 | 20 | 19 | 15 | 14 | 12 | 12 | 8 | 7 | 6 | 5 | 3 | 3 | 2 | 2 | 1 | 1 |
26 | 26 | 25 | 21 | 20 | 16 | 15 | 12 | 12 | 9 | 8 | 7 | 6 | 4 | 3 | 2 | 2 | 2 | 1 |
27 | 27 | 26 | 22 | 21 | 17 | 16 | 13 | 12 | 9 | 9 | 7 | 7 | 5 | 4 | 3 | 2 | 2 | 2 |
28 | 28 | 27 | 23 | 22 | 18 | 17 | 13 | 13 | 10 | 9 | 8 | 7 | 5 | 5 | 3 | 3 | 2 | 2 |
29 | 29 | 28 | 24 | 23 | 19 | 18 | 14 | 13 | 11 | 10 | 8 | 8 | 6 | 5 | 4 | 3 | 2 | 2 |
30 | 30 | 29 | 25 | 24 | 19 | 19 | 15 | 14 | 12 | 11 | 9 | 8 | 6 | 6 | 5 | 4 | 2 | 2 |
31 | 31 | 30 | 26 | 25 | 20 | 19 | 16 | 15 | 12 | 12 | 10 | 9 | 6 | 6 | 6 | 5 | 3 | 2 |
32 | 32 | 31 | 26 | 26 | 21 | 20 | 16 | 16 | 13 | 12 | 11 | 10 | 7 | 6 | 6 | 6 | 3 | 3 |
33 | ... | 32 | ... | 26 | ... | 21 | ... | 16 | ... | 13 | ... | 11 | ... | 7 | ... | 6 | ... | 3 |
All odd d-bit lexicode distances are exact copies of the even d+1 bit distances minus the last dimension, so an odd-dimensional space can never create something new or more interesting than the d+1 even-dimensional space above.
Since lexicodes are linear, they can also be constructed by means of their basis. [3]
Following C generate lexicographic code and parameters are set for the Golay code (N=24, D=8).
#include<stdio.h>#include<stdlib.h>intmain(){/* GOLAY CODE generation */inti,j,k;int_pc[1<<16]={0};// PopCount Macrofor(i=0;i<(1<<16);i++)for(j=0;j<16;j++)_pc[i]+=(i>>j)&1;#define pc(X) (_pc[(X)&0xffff] + _pc[((X)>>16)&0xffff])#define N 24 // N bits#define D 8 // D bits distanceunsignedint*z=malloc(1<<29);for(i=j=0;i<(1<<N);i++){// Scan all previousfor(k=j-1;k>=0;k--)// lexicodes.if(pc(z[k]^i)<D)// Reverse checkingbreak;// is way faster...if(k==-1){// Add new lexicodefor(k=0;k<N;k++)// & print itprintf("%d",(i>>k)&1);printf(" : %d\n",j);z[j++]=i;}}}
The theory of lexicographic codes is closely connected to combinatorial game theory. In particular, the codewords in a binary lexicographic code of distance d encode the winning positions in a variant of Grundy's game, played on a collection of heaps of stones, in which each move consists of replacing any one heap by at most d− 1 smaller heaps, and the goal is to take the last stone. [2]
In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data.
In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.
In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a name given to two different but related techniques for constructing a prefix code based on a set of symbols and their probabilities.
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.
In mathematics, the Leech lattice is an even unimodular lattice Λ24 in 24-dimensional Euclidean space, which is one of the best models for the kissing number problem. It was discovered by John Leech (1967). It may also have been discovered by Ernst Witt in 1940.
In mathematics and electronics engineering, a binary Golay code is a type of linear error-correcting code used in digital communications. The binary Golay code, along with the ternary Golay code, has a particularly deep and interesting connection to the theory of finite sporadic groups in mathematics. These codes are named in honor of Marcel J. E. Golay whose 1949 paper introducing them has been called, by E. R. Berlekamp, the "best single published page" in coding theory.
In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.
In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.
In mathematics and computer science, in the field of coding theory, the Hamming bound is a limit on the parameters of an arbitrary block code: it is also known as the sphere-packing bound or the volume bound from an interpretation in terms of packing balls in the Hamming metric into the space of all possible words. It gives an important limitation on the efficiency with which any error-correcting code can utilize the space in which its code words are embedded. A code that attains the Hamming bound is said to be a perfect code.
In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.
Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.
In statistics and coding theory, a Hamming space is usually the set of all binary strings of length N. It is used in the theory of coding signals and transmission.
In coding theory, a cyclic code is a block code, where the circular shifts of each codeword gives another word that belongs to the code. They are error-correcting codes that have algebraic properties that are convenient for efficient error detection and correction.
In coding theory, a generator matrix is a matrix whose rows form a basis for a linear code. The codewords are all of the linear combinations of the rows of this matrix, that is, the linear code is the row space of its generator matrix.
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.
In coding theory, Hamming(7,4) is a linear error-correcting code that encodes four bits of data into seven bits by adding three parity bits. It is a member of a larger family of Hamming codes, but the term Hamming code often refers to this specific code that Richard W. Hamming introduced in 1950. At the time, Hamming worked at Bell Telephone Laboratories and was frustrated with the error-prone punched card reader, which is why he started working on error-correcting codes.
In coding theory, a parity-check matrix of a linear block code C is a matrix which describes the linear relations that the components of a codeword must satisfy. It can be used to decide whether a particular vector is a codeword and is also used in decoding algorithms.
In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.
A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.
DNA code construction refers to the application of coding theory to the design of nucleic acid systems for the field of DNA–based computation.