Verhoeff algorithm

Last updated November 29, 2024

The Verhoeff algorithm^[1] is a checksum for error detection first published by Dutch mathematician Jacobus Verhoeff in 1969.^[2]^[3] It was the first decimal check digit algorithm which detects all single-digit errors, and all transposition errors involving two adjacent digits,^[4] which was at the time thought impossible with such a code.

Goals

Verhoeff had the goal of finding a decimal code—one where the check digit is a single decimal digit—which detected all single-digit errors and all transpositions of adjacent digits. At the time, supposed proofs of the nonexistence^[6] of these codes made base-11 codes popular, for example in the ISBN check digit.

His goals were also practical, and he based the evaluation of different codes on live data from the Dutch postal system, using a weighted points system for different kinds of error. The analysis broke the errors down into a number of categories: first, by how many digits are in error; for those with two digits in error, there are transpositions (ab → ba), twins (aa → 'bb'), jump transpositions (abc → cba), phonetic (1a → a0), and jump twins (aba → cbc). Additionally there are omitted and added digits. Although the frequencies of some of these kinds of errors might be small, some codes might be immune to them in addition to the primary goals of detecting all singles and transpositions.

The phonetic errors in particular showed linguistic effects, because in Dutch, numbers are typically read in pairs; and also while 50 sounds similar to 15 in Dutch, 80 doesn't sound like 18.

Taking six-digit numbers as an example, Verhoeff reported the following classification of the errors:.

Digits in error	Classification	Count	Frequency
1	Transcription	9,574	79.05%
2	Transpositions	1,237	10.21%
	Twins	67	0.55%
	Phonetic	59	0.49%
	Other adjacent	232	1.92%
	Jump transpositions	99	0.82%
	Jump Twins	35	0.29%
	Other jump errors	43	0.36%
	Other	98	0.81%
3		169	1.40%
4		118	0.97%
5		219	1.81%
6		162	1.34%
Total		12,112

Description

The general idea of the algorithm is to represent each of the digits (0 through 9) as elements of the dihedral group $D_{5}$ . That is, map digits to $D_{5}$ , manipulate these, then map back into digits. Let this mapping be $m:[0,9]\to D_{5}$

$m={\begin{pmatrix}0&1&2&3&4&5&6&7&8&9\\e&r&r^{2}&r^{3}&r^{4}&s&rs&r^{2}s&r^{3}s&r^{4}s\end{pmatrix}}$

Let the nth digit be $a_{n}$ and let the number of digits be $k$ .

For example given the code 248 then $k$ is 3 and $a_{3}=m(8)=r^{3}s$ .

Now define the permutation $f:D_{5}\to D_{5}$

$f={\begin{pmatrix}e&r&r^{2}&r^{3}&r^{4}&s&rs&r^{2}s&r^{3}s&r^{4}s\\r&s&r^{2}s&rs&r^{2}&r^{3}s&r^{3}&e&r^{4}s&r^{4}\end{pmatrix}}$

For example $f(r^{3})=rs$ . Another example is $f^{2}(r^{3})=r^{3}$ since $f(f(r^{3}))=f(rs)=r^{3}$

Using multiplicative notation for the group operation of $D_{5}$ , the check digit is then simply a value $c$ such that

$f(a_{1})\cdot f^{2}(a_{2})\cdot \ldots \cdot f^{k}(a_{k})\cdot f^{k+1}(c)=e$

$c$ is explicitly given by inverse permutation

$c=f^{-1-k}\left(\prod _{n=1}^{k}f^{n}(a_{n})^{-1}\right)$

For example the check digit for 248 is 5. To verify this, use the mapping to $D_{5}$ and insert into the LHS of the previous equation

$f(r^{2})\cdot f^{2}(r^{4})\cdot f^{3}(r^{3}s)\cdot f^{4}(s)=e$

To evaluate this permutation quickly use that

$f^{4}(s)=f^{3}(r^{3}s)=f^{2}(r^{4})=f(r^{2})=r^{2}s$

to get that

$r^{2}s\cdot r^{2}s\cdot r^{2}s\cdot r^{2}s=e$

This is the same reflection being iteratively multiplied. Use that reflections are their own inverse.^[7]

$(r^{2}s\cdot r^{2}s)\cdot (r^{2}s\cdot r^{2}s)=e^{2}=e$

In practice the algorithm is implemented using simple lookup tables without needing to understand how to generate those tables from the underlying group and permutation theory. This is more properly considered a family of algorithms, as other permutations work too. Verhoeff's notes that the particular permutation, given above, is special as it has the property of detecting 95.3% of the phonetic errors.^[8]

The strengths of the algorithm are that it detects all transliteration and transposition errors, and additionally most twin, twin jump, jump transposition and phonetic errors.

The main weakness of the Verhoeff algorithm is its complexity. The calculations required cannot easily be expressed as a formula in say ${\displaystyle \mathbb {Z} /10\mathbb {Z} }$ . Lookup tables are required for easy calculation. A similar code is the Damm algorithm, which has similar qualities.

Table-based algorithm

The Verhoeff algorithm can be implemented using three tables: a multiplication table d, an inverse table inv, and a permutation table p.

⁠ $d(j,k)$ ⁠^[9]		k
⁠ $d(j,k)$ ⁠^[9]		0	1	2	3	4	5	6	7	8	9
j	0	0	1	2	3	4	5	6	7	8	9
	1	1	2	3	4	0	6	7	8	9	5
	2	2	3	4	0	1	7	8	9	5	6
	3	3	4	0	1	2	8	9	5	6	7
	4	4	0	1	2	3	9	5	6	7	8
	5	5	9	8	7	6	0	4	3	2	1
	6	6	5	9	8	7	1	0	4	3	2
	7	7	6	5	9	8	2	1	0	4	3
	8	8	7	6	5	9	3	2	1	0	4
	9	9	8	7	6	5	4	3	2	1	0

j	⁠ $inv(j)$ ⁠
0	0
1	4
2	3
3	2
4	1
5	5
6	6
7	7
8	8
9	9

⁠ $p(pos,num)$ ⁠		num
⁠ $p(pos,num)$ ⁠		0	1	2	3	4	5	6	7	8	9
$pos (mod 8)$	0	0	1	2	3	4	5	6	7	8	9
	1	1	5	7	6	2	8	3	0	9	4
	2	5	8	0	3	7	9	6	1	4	2
	3	8	9	1	6	0	4	3	5	2	7
	4	9	4	5	3	1	2	6	8	7	0
	5	4	2	8	6	5	7	3	9	0	1
	6	2	7	9	3	8	0	6	4	1	5
	7	7	0	4	6	9	1	3	2	5	8

The first table, d, is based on multiplication in the dihedral group D₅.^[7] and is simply the Cayley table of the group. Note that this group is not commutative, that is, for some values of j and k, d(j,k) ≠ d(k, j).

The inverse table inv represents the multiplicative inverse of a digit, that is, the value that satisfies d(j, inv(j)) = 0.

The permutation table p applies a permutation to each digit based on its position in the number. This is actually a single permutation (1 5 8 9 4 2 7 0)(3 6) applied iteratively; i.e. p(i+j,n) = p(i, p(j,n)).

The Verhoeff checksum calculation is performed as follows:

Create an array n out of the individual digits of the number, taken from right to left (rightmost digit is n₀, etc.).
Initialize the checksum c to zero.
For each index i of the array n, starting at zero, replace c with ⁠ $d(c,p(i{\bmod {8}},n_{i}))$ ⁠.

The original number is valid if and only if ⁠ $c=0$ ⁠.

To generate a check digit, append a 0, perform the calculation: the correct check digit is ⁠ $inv(c)$ ⁠.

Examples

Generate a check digit for 236:

i	n_i	p(i,n_i)	c
0	0	0	0
1	6	3	3
2	3	3	1
3	2	1	2

c is 2, so the check digit is inv(2), which is 3.

Validate the check digit 2363:

i	n_i	p(i,n_i)	c
0	3	3	3
1	6	3	1
2	3	3	4
3	2	1	0

c is zero, so the check is correct.

Related Research Articles

In mathematics, the determinant is a scalar-valued function of the entries of a square matrix. The determinant of a matrix $A$ is commonly denoted $det(A)$ , $det A$ , or $| A |$ . Its value characterizes some properties of the matrix and the linear map represented, on a given basis, by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the corresponding linear map is an isomorphism.

In abstract algebra, the symmetric group defined over any set is the group whose elements are all the bijections from the set to itself, and whose group operation is the composition of functions. In particular, the finite symmetric group $defined over a finite set of symbols consists of the permutations that can be performed on the symbols. Since there are such permutation operations, the order of the symmetric group is .$

In coding theory, the Bose–Chaudhuri–Hocquenghem codes form a class of cyclic error-correcting codes that are constructed using polynomials over a finite field. BCH codes were invented in 1959 by French mathematician Alexis Hocquenghem, and independently in 1960 by Raj Chandra Bose and D. K. Ray-Chaudhuri. The name Bose–Chaudhuri–Hocquenghem arises from the initials of the inventors' surnames.

<span class="mw-page-title-main">Hamming code</span> Family of linear error-correcting codes

In computer science and telecommunications, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data.

In mathematics, a permutation of a set can mean one of two different things:

In mathematics, when X is a finite set with at least two elements, the permutations of X (i.e. the bijective functions from X to X) fall into two classes of equal size: the even permutations and the odd permutations. If any total ordering of X is fixed, the parity (oddness or evenness) of a permutation $of X can be defined as the parity of the number of inversions for σ, i.e., of pairs of elements x, y of X such that x < y and σ (x) > σ (y) .$

In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices $A$ and $B$ is denoted as $AB$ .

In mathematics, and in particular in group theory, a cyclic permutation is a permutation consisting of a single cycle. In some cases, cyclic permutations are referred to as cycles; if a cyclic permutation has k elements, it may be called a k-cycle. Some authors widen this definition to include permutations with fixed points in addition to at most one non-trivial cycle. In cycle notation, cyclic permutations are denoted by the list of their elements enclosed with parentheses, in the order to which they are permuted.

A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits computed by an algorithm from the other digits in the sequence input.

The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple check digit formula used to validate a variety of identification numbers. It is described in US patent 2950048A, granted on 23 August 1960.

In combinatorics, the factorial number system, also called factoradic, is a mixed radix numeral system adapted to numbering permutations. It is also called factorial base, although factorials do not function as base, but as place value of digits. By converting a number less than n! to factorial representation, one obtains a sequence of n digits that can be converted to a permutation of n elements in a straightforward way, either using them as Lehmer code or as inversion table representation; in the former case the resulting map from integers to permutations of n elements lists them in lexicographical order. General mixed radix systems were studied by Georg Cantor.

In mathematics, the digit sum of a natural number in a given number base is the sum of all its digits. For example, the digit sum of the decimal number $would be$

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.

Methods of computing square roots are algorithms for approximating the non-negative square root $of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.$

Approximations of <span class="texhtml mvar" style="font-style:italic;">π</span> Varying methods used to calculate pi

Approximations for the mathematical constant pi in the history of mathematics reached an accuracy within 0.04% of the true value before the beginning of the Common Era. In Chinese mathematics, this was improved to approximations correct to what corresponds to about seven decimal digits by the 5th century.

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish astronomer Tadeusz Banachiewicz in 1938. To quote: "It appears that Gauss and Doolittle applied the method [of elimination] only to symmetric equations. More recent authors, for example, Aitken, Banachiewicz, Dwyer, and Crout … have emphasized the use of the method, or variations of it, in connection with non-symmetric problems … Banachiewicz … saw the point … that the basic problem is really one of matrix factorization, or “decomposition” as he called it." It is also sometimes referred to as LR decomposition.

The Chudnovsky algorithm is a fast method for calculating the digits of $π$ , based on Ramanujan's $π$ formulae. Published by the Chudnovsky brothers in 1988, it was used to calculate $π$ to a billion decimal places.

The digits of some specific integers permute or shift cyclically when they are multiplied by a number n. Examples are:

Within computer engineering and computer science, a computer for operations with (mathematical) functions operates with functions at the hardware level.

In error detection, the Damm algorithm is a check digit algorithm that detects all single-digit errors and all adjacent transposition errors. It was presented by H. Michael Damm in 2004, as a part of his PhD dissertation entitled Totally Antisymmetric Quasigroups.

References

↑ Verhoeff, J. (1969). "Error Detecting Decimal Codes (Tract 29)". Zeitschrift für Angewandte Mathematik und Mechanik. 51 (3). The Mathematical Centre, Amsterdam: 240. Bibcode:1971ZaMM...51..240N. doi:10.1002/zamm.19710510323.
↑ Kirtland, Joseph (2001). "5. Group Theory and the Verhoeff Check Digit Scheme". Identification Numbers and Check Digit Schemes. Mathematical Association of America. p. 153. ISBN 0-88385-720-0.
↑ Salomon, David (2005). "§2.11 The Verhoeff Check Digit Method". Coding for Data and Computer Communications. Springer. pp. 56–58. ISBN 0-387-21245-0.
↑ Haunsperger, Deanna; Kennedy, Stephen, eds. (2006). The Edge of the Universe: Celebrating Ten Years of Math Horizons. Mathematical Association of America. p. 38. ISBN 978-0-88385-555-3. LCCN 2005937266.
↑ Gumm, H. (January 1985). "A new class of check-digit methods for arbitrary number systems (Corresp.)". IEEE Transactions on Information Theory. 31 (1): 102–105. doi:10.1109/TIT.1985.1056991.
↑ Sisson, Roger L. (May 1958). "An improved decimal redundancy check". Communications of the ACM. 1 (5): 10–12. doi: 10.1145/368819.368854 .
1 2 Gallian, Joseph A. (2010). Contemporary Abstract Algebra (7th ed.). Brooks/Cole. p. 111. ISBN 978-0-547-16509-7. LCCN 2008940386 . Retrieved August 26, 2011. verhoeff check digit.
↑ Verhoeff 1969 , p. 95
↑ Verhoeff 1969 , p. 83

External links

Detailed description of the Verhoeff algorithm

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Verhoeff_1969-1] Verhoeff, J. (1969). "Error Detecting Decimal Codes (Tract 29)". Zeitschrift für Angewandte Mathematik und Mechanik. 51 (3). The Mathematical Centre, Amsterdam: 240. Bibcode:1971ZaMM...51..240N. doi:10.1002/zamm.19710510323.

[Kirtland_2001-2] Kirtland, Joseph (2001). "5. Group Theory and the Verhoeff Check Digit Scheme". Identification Numbers and Check Digit Schemes. Mathematical Association of America. p. 153. ISBN 0-88385-720-0.

[Salomon_2005-3] Salomon, David (2005). "§2.11 The Verhoeff Check Digit Method". Coding for Data and Computer Communications. Springer. pp. 56–58. ISBN 0-387-21245-0.

[Haunsperger_2006-4] Haunsperger, Deanna; Kennedy, Stephen, eds. (2006). The Edge of the Universe: Celebrating Ten Years of Math Horizons. Mathematical Association of America. p. 38. ISBN 978-0-88385-555-3. LCCN 2005937266.

[5] Gumm, H. (January 1985). "A new class of check-digit methods for arbitrary number systems (Corresp.)". IEEE Transactions on Information Theory. 31 (1): 102–105. doi:10.1109/TIT.1985.1056991.

[Sisson_1958-6] Sisson, Roger L. (May 1958). "An improved decimal redundancy check". Communications of the ACM. 1 (5): 10–12. doi: 10.1145/368819.368854 .

[Gallian_2010-7] 1 2 Gallian, Joseph A. (2010). Contemporary Abstract Algebra (7th ed.). Brooks/Cole. p. 111. ISBN 978-0-547-16509-7. LCCN 2008940386 . Retrieved August 26, 2011. verhoeff check digit.

[8] Verhoeff 1969 , p. 95

[9] Verhoeff 1969 , p. 83

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]