Hamming(7,4)

Hamming(7,4)-code
Hamming(7,4)-code
Named after	Richard W. Hamming
Classification
Type	Linear block code
Block length	7
Message length	4
Rate	4/7 ~ 0.571
Distance	3
Alphabet size	2
Notation	[7,4,3]2-code
Properties
	perfect code
	v ; t ; e ;

Last updated November 21, 2022

Graphical depiction of the 4 data bits d1 to d4 and 3 parity bits p1 to p3 and which parity bits apply to which data bits Hamming(7,4).svg — Graphical depiction of the 4 data bits d₁ to d₄ and 3 parity bits p₁ to p₃ and which parity bits apply to which data bits

In coding theory, Hamming(7,4) is a linear error-correcting code that encodes four bits of data into seven bits by adding three parity bits. It is a member of a larger family of Hamming codes, but the term Hamming code often refers to this specific code that Richard W. Hamming introduced in 1950. At the time, Hamming worked at Bell Telephone Laboratories and was frustrated with the error-prone punched card reader, which is why he started working on error-correcting codes.^[1]

The Hamming code adds three additional check bits to every four data bits of the message. Hamming's (7,4) algorithm can correct any single-bit error, or detect all single-bit and two-bit errors. In other words, the minimal Hamming distance between any two correct codewords is 3, and received words can be correctly decoded if they are at a distance of at most one from the codeword that was transmitted by the sender. This means that for transmission medium situations where burst errors do not occur, Hamming's (7,4) code is effective (as the medium would have to be extremely noisy for two out of seven bits to be flipped).

In quantum information, the Hamming (7,4) is used as the base for the Steane code, a type of CSS code used for quantum error correction.

Goal

The goal of the Hamming codes is to create a set of parity bits that overlap so that a single-bit error in a data bit or a parity bit can be detected and corrected. While multiple overlaps can be created, the general method is presented in Hamming codes.

Bit #	1	2	3	4	5	6	7
Transmitted bit	$p_{1}$	$p_{2}$	$d_{1}$	$p_{3}$	$d_{2}$	$d_{3}$	$d_{4}$
$p_{1}$	Yes	No	Yes	No	Yes	No	Yes
$p_{2}$	No	Yes	Yes	No	No	Yes	Yes
$p_{3}$	No	No	No	Yes	Yes	Yes	Yes

This table describes which parity bits cover which transmitted bits in the encoded word. For example, p₂ provides an even parity for bits 2, 3, 6, and 7. It also details which transmitted bit is covered by which parity bit by reading the column. For example, d₁ is covered by p₁ and p₂ but not p₃ This table will have a striking resemblance to the parity-check matrix (H) in the next section.

Furthermore, if the parity columns in the above table were removed

	$d_{1}$	$d_{2}$	$d_{3}$	$d_{4}$
$p_{1}$	Yes	Yes	No	Yes
$p_{2}$	Yes	No	Yes	Yes
$p_{3}$	No	Yes	Yes	Yes

then resemblance to rows 1, 2, and 4 of the code generator matrix (G) below will also be evident.

So, by picking the parity bit coverage correctly, all errors with a Hamming distance of 1 can be detected and corrected, which is the point of using a Hamming code.

Hamming matrices

Hamming codes can be computed in linear algebra terms through matrices because Hamming codes are linear codes. For the purposes of Hamming codes, two Hamming matrices can be defined: the code generator matrix G and the parity-check matrix H:

\mathbf {G^{T}

As mentioned above, rows 1, 2, and 4 of G should look familiar as they map the data bits to their parity bits:

p₁ covers d₁, d₂, d₄
p₂ covers d₁, d₃, d₄
p₃ covers d₂, d₃, d₄

The remaining rows (3, 5, 6, 7) map the data to their position in encoded form and there is only 1 in that row so it is an identical copy. In fact, these four rows are linearly independent and form the identity matrix (by design, not coincidence).

Also as mentioned above, the three rows of H should be familiar. These rows are used to compute the syndrome vector at the receiving end and if the syndrome vector is the null vector (all zeros) then the received word is error-free; if non-zero then the value indicates which bit has been flipped.

The four data bits — assembled as a vector p— is pre-multiplied by G (i.e., Gp) and taken modulo 2 to yield the encoded value that is transmitted. The original 4 data bits are converted to seven bits (hence the name "Hamming(7,4)") with three parity bits added to ensure even parity using the above data bit coverages. The first table above shows the mapping between each data and parity bit into its final bit position (1 through 7) but this can also be presented in a Venn diagram. The first diagram in this article shows three circles (one for each parity bit) and encloses data bits that each parity bit covers. The second diagram (shown to the right) is identical but, instead, the bit positions are marked.

For the remainder of this section, the following 4 bits (shown as a column vector) will be used as a running example:

\mathbf {p} ={\begin{pmatrix}d_{1}\\d_{2}\\d_{3}\\d_{4}\\\end{pmatrix}}={\begin{pmatrix}1\\0\\1\\1\end{pmatrix}}

Channel coding

Suppose we want to transmit this data (1011) over a noisy communications channel. Specifically, a binary symmetric channel meaning that error corruption does not favor either zero or one (it is symmetric in causing errors). Furthermore, all source vectors are assumed to be equiprobable. We take the product of G and p, with entries modulo 2, to determine the transmitted codeword x:

\mathbf {x} =\mathbf {G^{T}} \mathbf {p} ={\begin{pmatrix}1&1&0&1\\1&0&1&1\\1&0&0&0\\0&1&1&1\\0&1&0&0\\0&0&1&0\\0&0&0&1\\\end{pmatrix}}{\begin{pmatrix}1\\0\\1\\1\end{pmatrix}}={\begin{pmatrix}2\\3\\1\\2\\0\\1\\1\end{pmatrix}}={\begin{pmatrix}0\\1\\1\\0\\0\\1\\1\end{pmatrix}}

This means that 0110011 would be transmitted instead of transmitting 1011.

Programmers concerned about multiplication should observe that each row of the result is the least significant bit of the Population Count of set bits resulting from the row and column being Bitwise ANDed together rather than multiplied.

In the adjacent diagram, the seven bits of the encoded word are inserted into their respective locations; from inspection it is clear that the parity of the red, green, and blue circles are even:

red circle has two 1's
green circle has two 1's
blue circle has four 1's

What will be shown shortly is that if, during transmission, a bit is flipped then the parity of two or all three circles will be incorrect and the errored bit can be determined (even if one of the parity bits) by knowing that the parity of all three of these circles should be even.

Parity check

If no error occurs during transmission, then the received codeword r is identical to the transmitted codeword x:

\mathbf {r} =\mathbf {x}

The receiver multiplies H and r to obtain the syndrome vector z, which indicates whether an error has occurred, and if so, for which codeword bit. Performing this multiplication (again, entries modulo 2):

\mathbf {z} =\mathbf {H} \mathbf {r} ={\begin{pmatrix}1&0&1&0&1&0&1\\0&1&1&0&0&1&1\\0&0&0&1&1&1&1\\\end{pmatrix}}{\begin{pmatrix}0\\1\\1\\0\\0\\1\\1\end{pmatrix}}={\begin{pmatrix}2\\4\\2\end{pmatrix}}={\begin{pmatrix}0\\0\\0\end{pmatrix}}

Since the syndrome z is the null vector, the receiver can conclude that no error has occurred. This conclusion is based on the observation that when the data vector is multiplied by G, a change of basis occurs into a vector subspace that is the kernel of H. As long as nothing happens during transmission, r will remain in the kernel of H and the multiplication will yield the null vector.

Error correction

Otherwise, suppose, we can write

\mathbf {r} =\mathbf {x} +\mathbf {e} _{i}

modulo 2, where e_i is the $i_{th}$ unit vector, that is, a zero vector with a 1 in the $i^{th}$ , counting from 1.

\mathbf {e} _{2}={\begin{pmatrix}0\\1\\0\\0\\0\\0\\0\end{pmatrix}}

Thus the above expression signifies a single bit error in the $i^{th}$ place.

Now, if we multiply this vector by H:

\mathbf {Hr} =\mathbf {H} \left(\mathbf {x} +\mathbf {e} _{i}\right)=\mathbf {Hx} +\mathbf {He} _{i}

Since x is the transmitted data, it is without error, and as a result, the product of H and x is zero. Thus

\mathbf {Hx} +\mathbf {He} _{i}=\mathbf {0} +\mathbf {He} _{i}=\mathbf {He} _{i}

Now, the product of H with the $i^{th}$ standard basis vector picks out that column of H, we know the error occurs in the place where this column of H occurs.

For example, suppose we have introduced a bit error on bit #5

\mathbf {r} =\mathbf {x} +\mathbf {e} _{5}={\begin{pmatrix}0\\1\\1\\0\\0\\1\\1\end{pmatrix}}+{\begin{pmatrix}0\\0\\0\\0\\1\\0\\0\end{pmatrix}}={\begin{pmatrix}0\\1\\1\\0\\1\\1\\1\end{pmatrix}}

The diagram to the right shows the bit error (shown in blue text) and the bad parity created (shown in red text) in the red and green circles. The bit error can be detected by computing the parity of the red, green, and blue circles. If a bad parity is detected then the data bit that overlaps only the bad parity circles is the bit with the error. In the above example, the red and green circles have bad parity so the bit corresponding to the intersection of red and green but not blue indicates the errored bit.

Now,

\mathbf {z} =\mathbf {Hr} ={\begin{pmatrix}1&0&1&0&1&0&1\\0&1&1&0&0&1&1\\0&0&0&1&1&1&1\\\end{pmatrix}}{\begin{pmatrix}0\\1\\1\\0\\1\\1\\1\end{pmatrix}}={\begin{pmatrix}3\\4\\3\end{pmatrix}}={\begin{pmatrix}1\\0\\1\end{pmatrix}}

which corresponds to the fifth column of H. Furthermore, the general algorithm used (see Hamming code#General algorithm ) was intentional in its construction so that the syndrome of 101 corresponds to the binary value of 5, which indicates the fifth bit was corrupted. Thus, an error has been detected in bit 5, and can be corrected (simply flip or negate its value):

\mathbf {r} _{\text{corrected}}={\begin{pmatrix}0\\1\\1\\0\\{\overline {1}}\\1\\1\end{pmatrix}}={\begin{pmatrix}0\\1\\1\\0\\0\\1\\1\end{pmatrix}}

This corrected received value indeed, now, matches the transmitted value x from above.

Decoding

Once the received vector has been determined to be error-free or corrected if an error occurred (assuming only zero or one bit errors are possible) then the received data needs to be decoded back into the original four bits.

First, define a matrix R:

\mathbf {R} ={\begin{pmatrix}0&0&1&0&0&0&0\\0&0&0&0&1&0&0\\0&0&0&0&0&1&0\\0&0&0&0&0&0&1\\\end{pmatrix}}

Then the received value, p_r, is equal to Rr. Using the running example from above

\mathbf {p_{r}} ={\begin{pmatrix}0&0&1&0&0&0&0\\0&0&0&0&1&0&0\\0&0&0&0&0&1&0\\0&0&0&0&0&0&1\\\end{pmatrix}}{\begin{pmatrix}0\\1\\1\\0\\0\\1\\1\end{pmatrix}}={\begin{pmatrix}1\\0\\1\\1\end{pmatrix}}

Multiple bit errors

It is not difficult to show that only single bit errors can be corrected using this scheme. Alternatively, Hamming codes can be used to detect single and double bit errors, by merely noting that the product of H is nonzero whenever errors have occurred. In the adjacent diagram, bits 4 and 5 were flipped. This yields only one circle (green) with an invalid parity but the errors are not recoverable.

However, the Hamming (7,4) and similar Hamming codes cannot distinguish between single-bit errors and two-bit errors. That is, two-bit errors appear the same as one-bit errors. If error correction is performed on a two-bit error the result will be incorrect.

Similarly, Hamming codes cannot detect or recover from an arbitrary three-bit error; Consider the diagram: if the bit in the green circle (colored red) were 1, the parity checking would return the null vector, indicating that there is no error in the codeword.

All codewords

Since the source is only 4 bits then there are only 16 possible transmitted words. Included is the eight-bit value if an extra parity bit is used (see Hamming(7,4) code with an additional parity bit ). (The data bits are shown in blue; the parity bits are shown in red; and the extra parity bit shown in green.)

Data $({\color {blue}d_{1}},{\color {blue}d_{2}},{\color {blue}d_{3}},{\color {blue}d_{4}})$	Hamming(7,4)		Hamming(7,4) with extra parity bit (Hamming(8,4))
	Transmitted $({\color {red}p_{1}},{\color {red}p_{2}},{\color {blue}d_{1}},{\color {red}p_{3}},{\color {blue}d_{2}},{\color {blue}d_{3}},{\color {blue}d_{4}})$	Diagram	Transmitted $({\color {red}p_{1}},{\color {red}p_{2}},{\color {blue}d_{1}},{\color {red}p_{3}},{\color {blue}d_{2}},{\color {blue}d_{3}},{\color {blue}d_{4}},{\color {green}p_{4}})$	Diagram
0000	0000000		00000000
1000	1110000		11100001
0100	1001100		10011001
1100	0111100		01111000
0010	0101010		01010101
1010	1011010		10110100
0110	1100110		11001100
1110	0010110		00101101
0001	1101001		11010010
1001	0011001		00110011
0101	0100101		01001011
1101	1010101		10101010
0011	1000011		10000111
1011	0110011		01100110
0111	0001111		00011110
1111	1111111		11111111

E₇ lattice

The Hamming(7,4) code is closely related to the E₇ lattice and, in fact, can be used to construct it, or more precisely, its dual lattice E₇^∗ (a similar construction for E₇ uses the dual code [7,3,4]₂). In particular, taking the set of all vectors x in Z⁷ with x congruent (modulo 2) to a codeword of Hamming(7,4), and rescaling by 1/√2, gives the lattice E₇^∗

E_{7}^{*}={\tfrac {1}{\sqrt {2}}}\left\{x\in \mathbb {Z} ^{7}:x{\bmod {2}}\in [7,4,3]_{2}\right\}.

This is a particular instance of a more general relation between lattices and codes. For instance, the extended (8,4)-Hamming code, which arises from the addition of a parity bit, is also related to the E₈ lattice. ^[2]

Related Research Articles

In coding theory, the Bose–Chaudhuri–Hocquenghem codes form a class of cyclic error-correcting codes that are constructed using polynomials over a finite field. BCH codes were invented in 1959 by French mathematician Alexis Hocquenghem, and independently in 1960 by Raj Chandra Bose and D.K. Ray-Chaudhuri. The name Bose–Chaudhuri–Hocquenghem arises from the initials of the inventors' surnames.

<span class="mw-page-title-main">Hamming code</span> Family of linear error-correcting codes

In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data.

Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, the most prominent of which include consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

<span class="mw-page-title-main">Coding theory</span> Study of the properties of codes and their fitness

Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.

<span class="mw-page-title-main">Binary Golay code</span>

In mathematics and electronics engineering, a binary Golay code is a type of linear error-correcting code used in digital communications. The binary Golay code, along with the ternary Golay code, has a particularly deep and interesting connection to the theory of finite sporadic groups in mathematics. These codes are named in honor of Marcel J. E. Golay whose 1949 paper introducing them has been called, by E. R. Berlekamp, the "best single published page" in coding theory.

In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph. LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical maximum for a symmetric memoryless channel. The noise threshold defines an upper bound for the channel noise, up to which the probability of lost information can be made as small as desired. Using iterative belief propagation techniques, LDPC codes can be decoded in time linear to their block length.

Quantum error correction (QEC) is used in quantum computing to protect quantum information from errors due to decoherence and other quantum noise. Quantum error correction is theorised as essential to achieve fault-tolerant quantum computation that can reduce the effects of noise on stored quantum information, faulty quantum gates, faulty quantum preparation, and faulty measurements.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.

In mathematics and computer science, in the field of coding theory, the Hamming bound is a limit on the parameters of an arbitrary block code: it is also known as the sphere-packing bound or the volume bound from an interpretation in terms of packing balls in the Hamming metric into the space of all possible words. It gives an important limitation on the efficiency with which any error-correcting code can utilize the space in which its code words are embedded. A code that attains the Hamming bound is said to be a perfect code.

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

In coding theory, a cyclic code is a block code, where the circular shifts of each codeword gives another word that belongs to the code. They are error-correcting codes that have algebraic properties that are convenient for efficient error detection and correction.

In coding theory, a generator matrix is a matrix whose rows form a basis for a linear code. The codewords are all of the linear combinations of the rows of this matrix, that is, the linear code is the row space of its generator matrix.

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

In coding theory, a parity-check matrix of a linear block code C is a matrix which describes the linear relations that the components of a codeword must satisfy. It can be used to decide whether a particular vector is a codeword and is also used in decoding algorithms.

Distributed source coding (DSC) is an important problem in information theory and communication. DSC problems regard the compression of multiple correlated information sources that do not communicate with each other. By modeling the correlation between multiple sources at the decoder side together with channel codes, DSC is able to shift the computational complexity from encoder side to decoder side, therefore provide appropriate frameworks for applications with complexity-constrained sender, such as sensor networks and video/multimedia compression. One of the main properties of distributed source coding is that the computational burden in encoders is shifted to the joint decoder.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

In mathematics and computer science, the binary Goppa code is an error-correcting code that belongs to the class of general Goppa codes originally described by Valerii Denisovich Goppa, but the binary structure gives it several mathematical advantages over non-binary variants, also providing a better fit for common usage in computers and telecommunication. Binary Goppa codes have interesting properties suitable for cryptography in McEliece-like cryptosystems and similar setups.

In coding theory, burst error-correcting codes employ methods of correcting burst errors, which are errors that occur in many consecutive bits rather than occurring in bits independently of each other.

References

↑ "History of Hamming Codes". Archived from the original on 2007-10-25. Retrieved 2008-04-03.
↑ Conway, John H.; Sloane, Neil J. A. (1998). Sphere Packings, Lattices and Groups (3rd ed.). New York: Springer-Verlag. ISBN 0-387-98585-9.

External links

A programming problem about the Hamming Code(7,4)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "History of Hamming Codes". Archived from the original on 2007-10-25. Retrieved 2008-04-03.

[Conway-2] Conway, John H.; Sloane, Neil J. A. (1998). Sphere Packings, Lattices and Groups (3rd ed.). New York: Springer-Verlag. ISBN 0-387-98585-9.

[1]

[2]