Lexicographic code

Last updated

Lexicographic codes or lexicodes are greedily generated error-correcting codes with remarkably good properties. They were produced independently by Vladimir Levenshtein [1] and by John Horton Conway and Neil Sloane. [2] The binary lexicographic codes are linear codes, and include the Hamming codes and the binary Golay codes. [2]

Contents

Construction

A lexicode of length n and minimum distance d over a finite field is generated by starting with the all-zero vector and iteratively adding the next vector (in lexicographic order) of minimum Hamming distance d from the vectors added so far. As an example, the length-3 lexicode of minimum distance 2 would consist of the vectors marked by an "X" in the following example:

VectorIn code?
000X
001
010
011X
100
101X
110X
111

Here is a table of all n-bit lexicode by d-bit minimal hamming distance, resulting of maximum 2m codewords dictionnary. For example, F4 code (n=4,d=2,m=3), extended Hamming code (n=8,d=4,m=4) and especially Golay code (n=24,d=8,m=12) shows exceptional compactness compared to neighbors.

n \ d123456789101112131415161718
11
221
3321
44311
554211
6653211
77643111
887442111
9985422111
1010965321111
111110764321111
1212118744221111
13131298543211111
1414131096543211111
151514111076542211111
1616151111875522111111
17171612119865322111111
181817131299763322111111
1919181413109874322111111
20201915141110985432211111
212120161512111095533221111
2222211716121211106543221111
2323221817131212116654222111
2424231918141312127655322211
2525242019151412128765332211
2626252120161512129876432221
2727262221171613129977543222
28282723221817131310987553322
292928242319181413111088654322
303029252419191514121198665422
3131302625201916151212109666532
32323126262120161613121110766633
33...32...26...21...16...13...11...7...6...3

All odd d-bit lexicode distances are exact copies of the even d+1 bit distances minus the last dimension, so an odd-dimensional space can never create something new or more interesting than the d+1 even-dimensional space above.

Since lexicodes are linear, they can also be constructed by means of their basis. [3]

Implementation

Following C generate lexicographic code and parameters are set for the Golay code (N=24, D=8).

#include<stdio.h>#include<stdlib.h>intmain(){/* GOLAY CODE generation */inti,j,k;int_pc[1<<16]={0};// PopCount Macrofor(i=0;i<(1<<16);i++)for(j=0;j<16;j++)_pc[i]+=(i>>j)&1;#define pc(X) (_pc[(X)&0xffff] + _pc[((X)>>16)&0xffff])#define N 24 // N bits#define D 8  // D bits distanceunsignedint*z=malloc(1<<29);for(i=j=0;i<(1<<N);i++){// Scan all previousfor(k=j-1;k>=0;k--)// lexicodes.if(pc(z[k]^i)<D)// Reverse checkingbreak;// is way faster...if(k==-1){// Add new lexicodefor(k=0;k<N;k++)// & print itprintf("%d",(i>>k)&1);printf(" : %d\n",j);z[j++]=i;}}}

Combinatorial game theory

The theory of lexicographic codes is closely connected to combinatorial game theory. In particular, the codewords in a binary lexicographic code of distance d encode the winning positions in a variant of Grundy's game, played on a collection of heaps of stones, in which each move consists of replacing any one heap by at most d 1 smaller heaps, and the goal is to take the last stone. [2]

Notes

  1. Levenšteĭn, V. I. (1960), "Об одном классе систематических кодов" [A class of systematic codes], Doklady Akademii Nauk SSSR (in Russian), 131 (5): 1011–1014, MR   0122629 ; English translation in Soviet Math. Doklady 1 (1960), 368–371
  2. 1 2 3 Conway, John H.; Sloane, N. J. A. (1986), "Lexicographic codes: error-correcting codes from game theory", IEEE Transactions on Information Theory , 32 (3): 337–348, CiteSeerX   10.1.1.392.795 , doi:10.1109/TIT.1986.1057187, MR   0838197
  3. Trachtenberg, Ari (2002), "Designing lexicographic codes with a given trellis complexity", IEEE Transactions on Information Theory , 48 (1): 89–100, doi:10.1109/18.971740, MR   1866958

Related Research Articles

<span class="mw-page-title-main">Hamming code</span> Family of linear error-correcting codes

In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data.

<span class="mw-page-title-main">Hamming distance</span> Number of bits that differ between two strings

In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.

In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a name given to two different but related techniques for constructing a prefix code based on a set of symbols and their probabilities.

<span class="mw-page-title-main">Coding theory</span> Study of the properties of codes and their fitness

Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.

In mathematics, the Leech lattice is an even unimodular lattice Λ24 in 24-dimensional Euclidean space, which is one of the best models for the kissing number problem. It was discovered by John Leech (1967). It may also have been discovered by Ernst Witt in 1940.

<span class="mw-page-title-main">Binary Golay code</span> Type of linear error-correcting code

In mathematics and electronics engineering, a binary Golay code is a type of linear error-correcting code used in digital communications. The binary Golay code, along with the ternary Golay code, has a particularly deep and interesting connection to the theory of finite sporadic groups in mathematics. These codes are named in honor of Marcel J. E. Golay whose 1949 paper introducing them has been called, by E. R. Berlekamp, the "best single published page" in coding theory.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.

In mathematics and computer science, in the field of coding theory, the Hamming bound is a limit on the parameters of an arbitrary block code: it is also known as the sphere-packing bound or the volume bound from an interpretation in terms of packing balls in the Hamming metric into the space of all possible words. It gives an important limitation on the efficiency with which any error-correcting code can utilize the space in which its code words are embedded. A code that attains the Hamming bound is said to be a perfect code.

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

<span class="mw-page-title-main">Hamming space</span>

In statistics and coding theory, a Hamming space is usually the set of all binary strings of length N. It is used in the theory of coding signals and transmission.

<span class="mw-page-title-main">Cyclic code</span> Type of block code

In coding theory, a cyclic code is a block code, where the circular shifts of each codeword gives another word that belongs to the code. They are error-correcting codes that have algebraic properties that are convenient for efficient error detection and correction.

In coding theory, a generator matrix is a matrix whose rows form a basis for a linear code. The codewords are all of the linear combinations of the rows of this matrix, that is, the linear code is the row space of its generator matrix.

<span class="mw-page-title-main">Hadamard code</span> Error-correcting code

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

<span class="mw-page-title-main">Hamming(7,4)</span> Linear error-correcting code

In coding theory, Hamming(7,4) is a linear error-correcting code that encodes four bits of data into seven bits by adding three parity bits. It is a member of a larger family of Hamming codes, but the term Hamming code often refers to this specific code that Richard W. Hamming introduced in 1950. At the time, Hamming worked at Bell Telephone Laboratories and was frustrated with the error-prone punched card reader, which is why he started working on error-correcting codes.

In coding theory, a parity-check matrix of a linear block code C is a matrix which describes the linear relations that the components of a codeword must satisfy. It can be used to decide whether a particular vector is a codeword and is also used in decoding algorithms.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

DNA code construction refers to the application of coding theory to the design of nucleic acid systems for the field of DNA–based computation.