Verhoeff algorithm

Last updated

The Verhoeff algorithm [1] is a checksum for error detection first published by Dutch mathematician Jacobus Verhoeff in 1969. [2] [3] It was the first decimal check digit algorithm which detects all single-digit errors, and all transposition errors involving two adjacent digits, [4] which was at the time thought impossible with such a code.

Contents

The method was independently discovered by H. Peter Gumm in 1985, this time including a formal proof and an extension to any base. [5]

Goals

Verhoeff had the goal of finding a decimal code—one where the check digit is a single decimal digit—which detected all single-digit errors and all transpositions of adjacent digits. At the time, supposed proofs of the nonexistence [6] of these codes made base-11 codes popular, for example in the ISBN check digit.

His goals were also practical, and he based the evaluation of different codes on live data from the Dutch postal system, using a weighted points system for different kinds of error. The analysis broke the errors down into a number of categories: first, by how many digits are in error; for those with two digits in error, there are transpositions (abba), twins (aa → 'bb'), jump transpositions (abccba), phonetic (1aa0), and jump twins (abacbc). Additionally there are omitted and added digits. Although the frequencies of some of these kinds of errors might be small, some codes might be immune to them in addition to the primary goals of detecting all singles and transpositions.

The phonetic errors in particular showed linguistic effects, because in Dutch, numbers are typically read in pairs; and also while 50 sounds similar to 15 in Dutch, 80 doesn't sound like 18.

Taking six-digit numbers as an example, Verhoeff reported the following classification of the errors:.

Digits in errorClassificationCountFrequency
1Transcription9,57479.05%
2Transpositions1,23710.21%
Twins670.55%
Phonetic590.49%
Other adjacent2321.92%
Jump transpositions990.82%
Jump Twins350.29%
Other jump errors430.36%
Other980.81%
31691.40%
41180.97%
52191.81%
61621.34%
Total12,112

Description

The general idea of the algorithm is to represent each of the digits (0 through 9) as elements of the dihedral group . That is, map digits to , manipulate these, then map back into digits. Let this mapping be

Let the nth digit be and let the number of digits be .

For example given the code 248 then is 3 and .

Now define the permutation

For example . Another example is since

Using multiplicative notation for the group operation of , the check digit is then simply a value such that

is explicitly given by inverse permutation

For example the check digit for 248 is 5. To verify this, use the mapping to and insert into the LHS of the previous equation

To evaluate this permutation quickly use that

to get that

This is the same reflection being iteratively multiplied. Use that reflections are their own inverse. [7]

In practice the algorithm is implemented using simple lookup tables without needing to understand how to generate those tables from the underlying group and permutation theory. This is more properly considered a family of algorithms, as other permutations work too. Verhoeff's notes that the particular permutation, given above, is special as it has the property of detecting 95.3% of the phonetic errors. [8]

The strengths of the algorithm are that it detects all transliteration and transposition errors, and additionally most twin, twin jump, jump transposition and phonetic errors.

The main weakness of the Verhoeff algorithm is its complexity. The calculations required cannot easily be expressed as a formula in say . Lookup tables are required for easy calculation. A similar code is the Damm algorithm, which has similar qualities.

Table-based algorithm

The Verhoeff algorithm can be implemented using three tables: a multiplication table d, an inverse table inv, and a permutation table p.

The first table, d, is based on multiplication in the dihedral group D5. [7] and is simply the Cayley table of the group. Note that this group is not commutative, that is, for some values of j and k, d(j,k) ≠ d(k, j).

The inverse table inv represents the multiplicative inverse of a digit, that is, the value that satisfies d(j, inv(j)) = 0.

The permutation table p applies a permutation to each digit based on its position in the number. This is actually a single permutation (1 5 8 9 4 2 7 0)(3 6) applied iteratively; i.e. p(i+j,n) = p(i, p(j,n)).

The Verhoeff checksum calculation is performed as follows:

  1. Create an array n out of the individual digits of the number, taken from right to left (rightmost digit is n0, etc.).
  2. Initialize the checksum c to zero.
  3. For each index i of the array n, starting at zero, replace c with .

The original number is valid if and only if .

To generate a check digit, append a 0, perform the calculation: the correct check digit is .

Examples

See also

Related Research Articles

<span class="mw-page-title-main">Symmetric group</span> Type of group in abstract algebra

In abstract algebra, the symmetric group defined over any set is the group whose elements are all the bijections from the set to itself, and whose group operation is the composition of functions. In particular, the finite symmetric group defined over a finite set of symbols consists of the permutations that can be performed on the symbols. Since there are such permutation operations, the order of the symmetric group is .

In coding theory, the Bose–Chaudhuri–Hocquenghem codes form a class of cyclic error-correcting codes that are constructed using polynomials over a finite field. BCH codes were invented in 1959 by French mathematician Alexis Hocquenghem, and independently in 1960 by Raj Chandra Bose and D.K. Ray-Chaudhuri. The name Bose–Chaudhuri–Hocquenghem arises from the initials of the inventors' surnames.

<span class="mw-page-title-main">Permutation</span> Mathematical version of an order change

In mathematics, a permutation of a set can mean one of two different things:

Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, including consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, Data Matrix, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

In mathematics, particularly in matrix theory, a permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column with all other entries 0. An n × n permutation matrix can represent a permutation of n elements. Pre-multiplying an n-row matrix M by a permutation matrix P, forming PM, results in permuting the rows of M, while post-multiplying an n-column matrix M, forming MP, permutes the columns of M.

In arithmetic, long division is a standard division algorithm suitable for dividing multi-digit Hindu-Arabic numerals that is simple enough to perform by hand. It breaks down a division problem into a series of easier steps.

In mathematics, and in particular in group theory, a cyclic permutation is a permutation consisting of a single cycle. In some cases, cyclic permutations are referred to as cycles; if a cyclic permutation has k elements, it may be called a k-cycle. Some authors widen this definition to include permutations with fixed points in addition to at most one non-trivial cycle. In cycle notation, cyclic permutations are denoted by the list of their elements enclosed with parentheses, in the order to which they are permuted.

A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits computed by an algorithm from the other digits in the sequence input.

The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple check digit formula used to validate a variety of identification numbers.

In combinatorics, the factorial number system, also called factoradic, is a mixed radix numeral system adapted to numbering permutations. It is also called factorial base, although factorials do not function as base, but as place value of digits. By converting a number less than n! to factorial representation, one obtains a sequence of n digits that can be converted to a permutation of n elements in a straightforward way, either using them as Lehmer code or as inversion table representation; in the former case the resulting map from integers to permutations of n elements lists them in lexicographical order. General mixed radix systems were studied by Georg Cantor.

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

Methods of computing square roots are algorithms for approximating the non-negative square root of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

Approximations of <span class="texhtml mvar" style="font-style:italic;">π</span> Varying methods used to calculate pi

Approximations for the mathematical constant pi in the history of mathematics reached an accuracy within 0.04% of the true value before the beginning of the Common Era. In Chinese mathematics, this was improved to approximations correct to what corresponds to about seven decimal digits by the 5th century.

A repeating decimal or recurring decimal is a decimal representation of a number whose digits are eventually periodic ; if this sequence consists only of zeros, the decimal is said to be terminating, and is not considered as repeating. It can be shown that a number is rational if and only if its decimal representation is repeating or terminating. For example, the decimal representation of 1/3 becomes periodic just after the decimal point, repeating the single digit "3" forever, i.e. 0.333.... A more complicated example is 3227/555, whose decimal becomes periodic at the second digit following the decimal point and then repeats the sequence "144" forever, i.e. 5.8144144144.... Another example of this is 593/53, which becomes periodic after the decimal point, repeating the 13-digit pattern "1886792452830" forever, i.e. 11.18867924528301886792452830....

The Chudnovsky algorithm is a fast method for calculating the digits of π, based on Ramanujan's π formulae. Published by the Chudnovsky brothers in 1988, it was used to calculate π to a billion decimal places.

In coding theory, a polynomial code is a type of linear code whose set of valid code words consists of those polynomials that are divisible by a given fixed polynomial.

The digits of some specific integers permute or shift cyclically when they are multiplied by a number n. Examples are:

In error detection, the Damm algorithm is a check digit algorithm that detects all single-digit errors and all adjacent transposition errors. It was presented by H. Michael Damm in 2004, as a part of his PhD dissertation entitled Totally Antisymmetric Quasigroups.

References

  1. Verhoeff, J. (1969). Error Detecting Decimal Codes (Tract 29). The Mathematical Centre, Amsterdam. Bibcode:1971ZaMM...51..240N. doi:10.1002/zamm.19710510323.
  2. Kirtland, Joseph (2001). "5. Group Theory and the Verhoeff Check Digit Scheme". Identification Numbers and Check Digit Schemes. Mathematical Association of America. p. 153. ISBN   0-88385-720-0.
  3. Salomon, David (2005). "§2.11 The Verhoeff Check Digit Method". Coding for Data and Computer Communications. Springer. pp. 56–58. ISBN   0-387-21245-0.
  4. Haunsperger, Deanna; Kennedy, Stephen, eds. (2006). The Edge of the Universe: Celebrating Ten Years of Math Horizons. Mathematical Association of America. p. 38. ISBN   978-0-88385-555-3. LCCN   2005937266.
  5. Gumm, H. (January 1985). "A new class of check-digit methods for arbitrary number systems (Corresp.)". IEEE Transactions on Information Theory. 31 (1): 102–105. doi:10.1109/TIT.1985.1056991.
  6. Sisson, Roger L. (May 1958). "An improved decimal redundancy check". Communications of the ACM. 1 (5): 10–12. doi: 10.1145/368819.368854 .
  7. 1 2 Gallian, Joseph A. (2010). Contemporary Abstract Algebra (7th ed.). Brooks/Cole. p.  111. ISBN   978-0-547-16509-7. LCCN   2008940386 . Retrieved August 26, 2011. verhoeff check digit.
  8. Verhoeff 1969 , p. 95
  9. Verhoeff 1969 , p. 83