In coding theory, a constant-weight code, also called an m-of-n code, is an error detection and correction code where all codewords share the same Hamming weight. The one-hot code and the balanced code are two widely used kinds of constant-weight code.
The theory is closely connected to that of designs (such as t-designs and Steiner systems). Most of the work on this field of discrete mathematics is concerned with binary constant-weight codes.
Binary constant-weight codes have several applications, including frequency hopping in GSM networks. [1] Most barcodes use a binary constant-weight code to simplify automatically setting the brightness threshold that distinguishes black and white stripes. Most line codes use either a constant-weight code, or a nearly-constant-weight paired disparity code. In addition to use as error correction codes, the large space between code words can also be used in the design of asynchronous circuits such as delay insensitive circuits.
Constant-weight codes, like Berger codes, can detect all unidirectional errors.
The central problem regarding constant-weight codes is the following: what is the maximum number of codewords in a binary constant-weight code with length , Hamming distance , and weight ? This number is called .
Apart from some trivial observations, it is generally impossible to compute these numbers in a straightforward way. Upper bounds are given by several important theorems such as the first and second Johnson bounds, [2] and better upper bounds can sometimes be found in other ways. Lower bounds are most often found by exhibiting specific codes, either with use of a variety of methods from discrete mathematics, or through heavy computer searching. A large table of such record-breaking codes was published in 1990, [3] and an extension to longer codes (but only for those values of and which are relevant for the GSM application) was published in 2006. [1]
A special case of constant weight codes are the one-of-N codes, that encode bits in a code-word of bits. The one-of-two code uses the code words 01 and 10 to encode the bits '0' and '1'. A one-of-four code can use the words 0001, 0010, 0100, 1000 in order to encode two bits 00, 01, 10, and 11. An example is dual rail encoding, and chain link [4] used in delay insensitive circuits. For these codes, and .
Some of the more notable uses of one-hot codes include biphase mark code uses a 1-of-2 code; pulse-position modulation uses a 1-of-n code; address decoder, etc.
In coding theory, a balanced code is a binary forward error correction code for which each codeword contains an equal number of zero and one bits. Balanced codes have been introduced by Donald Knuth; [5] they are a subset of so-called unordered codes, which are codes having the property that the positions of ones in a codeword are never a subset of the positions of the ones in another codeword. Like all unordered codes, balanced codes are suitable for the detection of all unidirectional errors in an encoded message. Balanced codes allow for particularly efficient decoding, which can be carried out in parallel. [5] [6] [7]
Some of the more notable uses of balanced-weight codes include biphase mark code uses a 1 of 2 code; 6b/8b encoding uses a 4 of 8 code; the Hadamard code is a of code (except for the zero codeword), the three-of-six code; etc.
The 3-wire lane encoding used in MIPI C-PHY can be considered a generalization of constant-weight code to ternary -- each wire transmits a ternary signal, and at any one instant one of the 3 wires is transmitting a low, one is transmitting a middle, and one is transmitting a high signal. [8]
An m-of-n code is a separable error detection code with a code word length of n bits, where each code word contains exactly m instances of a "one". A single bit error will cause the code word to have either m + 1 or m− 1 "ones". An example m-of-n code is the 2-of-5 code used by the United States Postal Service.
The simplest implementation is to append a string of ones to the original data until it contains m ones, then append zeros to create a code of length n.
Example:
Original 3 data bits | Appended bits |
---|---|
000 | 111 |
001 | 110 |
010 | 110 |
011 | 100 |
100 | 110 |
101 | 100 |
110 | 100 |
111 | 000 |
Some of the more notable uses of constant-weight codes, other than the one-hot and balanced-weight codes already mentioned above, include Code 39 uses a 3-of-9 code; bi-quinary coded decimal code uses a 2-of-7 code, the 2-of-5 code, etc.
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction.
Differential Manchester encoding (DM) is a line code in digital frequency modulation in which data and clock signals are combined to form a single two-level self-synchronizing data stream. Each data bit is encoded by a presence or absence of signal level transition in the middle of the bit period, followed by the mandatory level transition at the beginning. The code is insensitive to an inversion of polarity. In various specific applications, this method is also called by various other names, including biphase mark code (CC), F2F, Aiken biphase, and conditioned diphase.
In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained code in data storage systems. Some signals are more prone to error than others as the physics of the communication channel or storage medium constrains the repertoire of signals that can be used reliably.
The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit.
A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix of any other code word in the system. It is trivially true for fixed-length code, so only a point of consideration in variable-length code.
A binary code represents text, computer processor instructions, or any other data using a two-symbol system. The two-symbol system used is often "0" and "1" from the binary number system. The binary code assigns a pattern of binary digits, also known as bits, to each character, instruction, etc. For example, a binary string of eight bits can represent any of 256 possible values and can, therefore, represent a wide variety of different items.
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.
In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph. LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical maximum for a symmetric memoryless channel. The noise threshold defines an upper bound for the channel noise, up to which the probability of lost information can be made as small as desired. Using iterative belief propagation techniques, LDPC codes can be decoded in time linear to their block length.
In telecommunications, 8b/10b is a line code that maps 8-bit words to 10-bit symbols to achieve DC balance and bounded disparity, and at the same time provide enough state changes to allow reasonable clock recovery. This means that the difference between the counts of ones and zeros in a string of at least 20 bits is no more than two, and that there are not more than five ones or zeros in a row. This helps to reduce the demand for the lower bandwidth limit of the channel necessary to transfer the signal.
In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.
Eight-to-fourteen modulation (EFM) is a data encoding technique – formally, a line code – used by compact discs (CD), laserdiscs (LD) and pre-Hi-MD MiniDiscs. EFMPlus is a related code, used in DVDs and Super Audio CDs (SACDs).
In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.
In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.
In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.
In computer science and information theory, a canonical Huffman code is a particular type of Huffman code with unique properties which allow it to be described in a very compact manner. Rather than storing the structure of the code tree explicitly, canonical Huffman codes are ordered in such a way that it suffices to only store the lengths of the codewords, which reduces the overhead of the codebook.
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.
Lexicographic codes or lexicodes are greedily generated error-correcting codes with remarkably good properties. They were produced independently by Vladimir Levenshtein and by John Horton Conway and Neil Sloane. The binary lexicographic codes are linear codes, and include the Hamming codes and the binary Golay codes.
A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.