Mathematics of cyclic redundancy checks

Last updated

The cyclic redundancy check (CRC) is based on division in the ring of polynomials over the finite field GF(2) (the integers modulo 2), that is, the set of polynomials where each coefficient is either zero or one, and arithmetic operations wrap around.

Contents

Any string of bits can be interpreted as the coefficients of a message polynomial of this sort, and to find the CRC, we multiply the message polynomial by and then find the remainder when dividing by the degree-generator polynomial. The coefficients of the remainder polynomial are the bits of the CRC.

Formulation

In general, computation of CRC corresponds to Euclidean division of polynomials over GF(2):

Here is the original message polynomial and is the degree- generator polynomial. The bits of are the original message with zeroes added at the end. The CRC 'checksum' is formed by the coefficients of the remainder polynomial whose degree is strictly less than . The quotient polynomial is of no interest. Using modulo operation, it can be stated that

In communication, the sender attaches the bits of R after the original message bits of M, which could be shown to be equivalent to sending out (the codeword.) The receiver, knowing and therefore , separates M from R and repeats the calculation, verifying that the received and computed R are equal. If they are, then the receiver assumes the received message bits are correct.

In practice CRC calculations most closely resemble long division in binary, except that the subtractions involved do not borrow from more significant digits, and thus become exclusive or operations.

A CRC is a checksum in a strict mathematical sense, as it can be expressed as the weighted modulo-2 sum of per-bit syndromes, but that word is generally reserved more specifically for sums computed using larger moduli, such as 10, 256, or 65535.

CRCs can also be used as part of error-correcting codes, which allow not only the detection of transmission errors, but the reconstruction of the correct message. These codes are based on closely related mathematical principles.

Polynomial arithmetic modulo 2

Since the coefficients are constrained to a single bit, any math operation on CRC polynomials must map the coefficients of the result to either zero or one. For example, in addition:

Note that is equivalent to zero in the above equation because addition of coefficients is performed modulo 2:

Polynomial addition modulo 2 is the same as bitwise XOR. Since XOR is the inverse of itself, polynominal subtraction modulo 2 is the same as bitwise XOR too.

Multiplication is similar (a carry-less product):

We can also divide polynomials mod 2 and find the quotient and remainder. For example, suppose we're dividing by . We would find that

In other words,

The division yields a quotient of x2 + 1 with a remainder of −1, which, since it is odd, has a last bit of 1.

In the above equations, represents the original message bits 111, is the generator polynomial, and the remainder (equivalently, ) is the CRC. The degree of the generator polynomial is 1, so we first multiplied the message by to get .

Variations

There are several standard variations on CRCs, any or all of which may be used with any CRC polynomial. Implementation variations such as endianness and CRC presentation only affect the mapping of bit strings to the coefficients of and , and do not impact the properties of the algorithm.

This simplifies many implementations by avoiding the need to treat the last few bytes of the message specially when checking CRCs.
The reason this method is used is because an unmodified CRC does not distinguish between two messages which differ only in the number of leading zeroes, because leading zeroes do not affect the value of . When this inversion is done, the CRC does distinguish between such messages.

In practice, the last two variations are invariably used together. They change the transmitted CRC, so must be implemented at both the transmitter and the receiver. While presetting the shift register to ones is straightforward to do at both ends, inverting affects receivers implementing the first variation, because the CRC of a full codeword that already includes a CRC is no longer zero. Instead, it is a fixed non-zero pattern, the CRC of the inversion pattern of ones.

The CRC thus may be checked either by the obvious method of computing the CRC on the message, inverting it, and comparing with the CRC in the message stream, or by calculating the CRC on the entire codeword and comparing it with an expected fixed value , called the check polynomial, residue or magic number. This may be computed as , or equivalently by computing the unmodified CRC of a message consisting of ones, .

These inversions are extremely common but not universally performed, even in the case of the CRC-32 or CRC-16-CCITT polynomials.

Reversed representations and reciprocal polynomials

Polynomial representations

Example of CCITT 16-bit Polynomial in the forms described (bits inside square brackets are included in the word representation; bits outside are implied 1 bits; vertical bars designate nibble boundaries):

16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   coefficient  1 [0  0  0  1 |0  0  0  0 |0  0  1  0 |0  0  0  1]  Normal                            [     1     |     0     |     2     |     1    ]  Nibbles of Normal                 0x1021   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 [1  0  0  0 |0  1  0  0 |0  0  0  0 |1  0  0  0] 1   Reverse                        [     8     |     4     |     0     |     8    ]     Nibbles of Reverse                0x8408  16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0  1 [0  0  0  0 |1  0  0  0 |0  0  0  1 |0  0  0  1]  Reciprocal                        [     0     |     8     |     1     |     1    ]  Nibbles of Reciprocal             0x0811   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16   Reverse reciprocal 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   Koopman [1  0  0  0 |1  0  0  0 |0  0  0  1 |0  0  0  0] 1     [     8     |     8     |     1     |     0    ]     Nibbles                           0x8810

All the well-known CRC generator polynomials of degree have two common hexadecimal representations. In both cases, the coefficient of is omitted and understood to be 1.

The msbit-first form is often referred to in the literature as the normal representation, while the lsbit-first is called the reversed representation. It is essential to use the correct form when implementing a CRC. If the coefficient of happens to be zero, the forms can be distinguished at a glance by seeing which end has the bit set.

To further confuse the matter, the paper by P. Koopman and T. Chakravarty [1] [2] converts CRC generator polynomials to hexadecimal numbers in yet another way: msbit-first, but including the coefficient and omitting the coefficient. This "Koopman" representation has the advantage that the degree can be determined from the hexadecimal form and the coefficients are easy to read off in left-to-right order. However, it is not used anywhere else and is not recommended due to the risk of confusion.

Reciprocal polynomials

A reciprocal polynomial is created by assigning the through coefficients of one polynomial to the through coefficients of a new polynomial. That is, the reciprocal of the degree polynomial is .

The most interesting property of reciprocal polynomials, when used in CRCs, is that they have exactly the same error-detecting strength as the polynomials they are reciprocals of. The reciprocal of a polynomial generates the same codewords, only bit reversed that is, if all but the first bits of a codeword under the original polynomial are taken, reversed and used as a new message, the CRC of that message under the reciprocal polynomial equals the reverse of the first bits of the original codeword. But the reciprocal polynomial is not the same as the original polynomial, and the CRCs generated using it are not the same (even modulo bit reversal) as those generated by the original polynomial.

Error detection strength

The error-detection ability of a CRC depends on the degree of its key polynomial and on the specific key polynomial used. The "error polynomial" is the symmetric difference of the received message codeword and the correct message codeword. An error will go undetected by a CRC algorithm if and only if the error polynomial is divisible by the CRC polynomial.

(As an aside, there is never reason to use a polynomial with a zero term. Recall that a CRC is the remainder of the message polynomial times divided by the CRC polynomial. A polynomial with a zero term always has as a factor. So if is the original CRC polynomial and , then

That is, the CRC of any message with the polynomial is the same as that of the same message with the polynomial with a zero appended. It is just a waste of a bit.)

The combination of these factors means that good CRC polynomials are often primitive polynomials (which have the best 2-bit error detection) or primitive polynomials of degree , multiplied by (which detects all odd numbers of bit errors, and has half the two-bit error detection ability of a primitive polynomial of degree ). [1]

Bitfilters

Analysis technique using bitfilters [1] allows one to very efficiently determine the properties of a given generator polynomial. The results are the following:

  1. All burst errors (but one) with length no longer than the generator polynomial can be detected by any generator polynomial . This includes 1-bit errors (burst of length 1). The maximum length is , when is the degree of the generator polynomial (which itself has a length of ). The exception to this result is a bit pattern the same as that of the generator polynomial.
  2. All uneven bit errors are detected by generator polynomials with even number of terms.
  3. 2-bit errors in a (multiple) distance of the longest bitfilter of even parity to a generator polynomial are not detected; all others are detected. For degrees up to 32 there is an optimal generator polynomial with that degree and even number of terms; in this case the period mentioned above is . For this means that blocks of 32,767 bits length do not contain undiscovered 2-bit errors. For uneven number of terms in the generator polynomial there can be a period of ; however, these generator polynomials (with odd number of terms) do not discover all odd number of errors, so they should be avoided. A list of the corresponding generators with even number of terms can be found in the link mentioned at the beginning of this section.
  4. All single bit errors within the bitfilter period mentioned above (for even terms in the generator polynomial) can be identified uniquely by their residual. So CRC method can be used to correct single-bit errors as well (within those limits, e.g. 32,767 bits with optimal generator polynomials of degree 16). Since all odd errors leave an odd residual, all even an even residual, 1-bit errors and 2-bit errors can be distinguished. However, like other SECDED techniques, CRCs cannot always distinguish between 1-bit errors and 3-bit errors. When 3 or more bit errors occur in a block, CRC bit error correction will be erroneous itself and produce more errors.

See also

Related Research Articles

A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction.

In coding theory, the Bose–Chaudhuri–Hocquenghem codes form a class of cyclic error-correcting codes that are constructed using polynomials over a finite field. BCH codes were invented in 1959 by French mathematician Alexis Hocquenghem, and independently in 1960 by Raj Chandra Bose and D.K. Ray-Chaudhuri. The name Bose–Chaudhuri–Hocquenghem arises from the initials of the inventors' surnames.

Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, the most prominent of which include consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

<span class="mw-page-title-main">Polynomial ring</span> Algebraic structure

In mathematics, especially in the field of algebra, a polynomial ring or polynomial algebra is a ring formed from the set of polynomials in one or more indeterminates with coefficients in another ring, often a field.

A commitment scheme is a cryptographic primitive that allows one to commit to a chosen value while keeping it hidden to others, with the ability to reveal the committed value later. Commitment schemes are designed so that a party cannot change the value or statement after they have committed to it: that is, commitment schemes are binding. Commitment schemes have important applications in a number of cryptographic protocols including secure coin flipping, zero-knowledge proofs, and secure computation.

The NTRUEncrypt public key cryptosystem, also known as the NTRU encryption algorithm, is an NTRU lattice-based alternative to RSA and elliptic curve cryptography (ECC) and is based on the shortest vector problem in a lattice.

In finite field theory, a branch of mathematics, a primitive polynomial is the minimal polynomial of a primitive element of the finite field GF(pm). This means that a polynomial F(X) of degree m with coefficients in GF(p) = Z/pZ is a primitive polynomial if it is monic and has a root α in GF(pm) such that is the entire field GF(pm). This implies that α is a primitive (pm − 1)-root of unity in GF(pm).

In algebra, given a polynomial

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

<span class="mw-page-title-main">Cyclic code</span> Type of block code

In coding theory, a cyclic code is a block code, where the circular shifts of each codeword gives another word that belongs to the code. They are error-correcting codes that have algebraic properties that are convenient for efficient error detection and correction.

<span class="mw-page-title-main">Hadamard code</span> Error-correcting code

The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

In algebra, the greatest common divisor of two polynomials is a polynomial, of the highest possible degree, that is a factor of both the two original polynomials. This concept is analogous to the greatest common divisor of two integers.

<span class="mw-page-title-main">Computation of cyclic redundancy checks</span> Overview of the computation of cyclic redundancy checks

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.

In coding theory, a polynomial code is a type of linear code whose set of valid code words consists of those polynomials that are divisible by a given fixed polynomial.

The Berlekamp–Welch algorithm, also known as the Welch–Berlekamp algorithm, is named for Elwyn R. Berlekamp and Lloyd R. Welch. This is a decoder algorithm that efficiently corrects errors in Reed–Solomon codes for an RS(n, k), code based on the Reed Solomon original view where a message is used as coefficients of a polynomial or used with Lagrange interpolation to generate the polynomial of degree < k for inputs and then is applied to to create an encoded codeword .

In coding theory, folded Reed–Solomon codes are like Reed–Solomon codes, which are obtained by mapping Reed–Solomon codewords over a larger alphabet by careful bundling of codeword symbols.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes in the presence of many errors. If a code has relative distance , then it is possible in principle to recover an encoded message when up to fraction of the codeword symbols are corrupted. But when error rate is greater than , this will not in general be possible. List decoding overcomes that issue by allowing the decoder to output a short list of messages that might have been encoded. List decoding can correct more than fraction of errors.

In coding theory, burst error-correcting codes employ methods of correcting burst errors, which are errors that occur in many consecutive bits rather than occurring in bits independently of each other.

References

  1. 1 2 3 Koopman, Philip (July 2002). "32-bit cyclic redundancy codes for Internet applications". Proceedings International Conference on Dependable Systems and Networks (PDF). pp. 459–468. CiteSeerX   10.1.1.11.8323 . doi:10.1109/DSN.2002.1028931. ISBN   978-0-7695-1597-7. S2CID   14775606 . Retrieved 14 January 2011. - verification of Castagnoli's results by exhaustive search and some new good polynomials
  2. Koopman, Philip; Chakravarty, Tridib (June 2004). "Cyclic redundancy code (CRC) polynomial selection for embedded networks". International Conference on Dependable Systems and Networks, 2004 (PDF). pp. 145–154. CiteSeerX   10.1.1.648.9080 . doi:10.1109/DSN.2004.1311885. ISBN   978-0-7695-2052-0. S2CID   793862 . Retrieved 14 January 2011. – analysis of short CRC polynomials for embedded applications