Check digit

Last updated

A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits (or letters) computed by an algorithm from the other digits (or letters) in the sequence input. [1]

Contents

With a check digit, one can detect simple errors in the input of a series of characters (usually digits) such as a single mistyped digit or some permutations of two successive digits.

Design

Check digit algorithms are generally designed to capture human transcription errors. In order of complexity, these include the following: [2]

In choosing a system, a high probability of catching errors is traded off against implementation difficulty; simple check digit systems are easily understood and implemented by humans but do not catch as many errors as complex ones, which require sophisticated programs to implement.

A desirable feature is that left-padding with zeros should not change the check digit. This allows variable length numbers to be used and the length to be changed. If there is a single check digit added to the original number, the system will not always capture multiple errors, such as two replacement errors (12 → 34) though, typically, double errors will be caught 90% of the time (both changes would need to change the output by offsetting amounts).

A very simple check digit method would be to take the sum of all digits (digital sum) modulo 10. This would catch any single-digit error, as such an error would always change the sum, but does not catch any transposition errors (switching two digits) as re-ordering does not change the sum.

A slightly more complex method is to take the weighted sum of the digits, modulo 10, with different weights for each number position.

To illustrate this, for example if the weights for a four digit number were 5, 3, 2, 7 and the number to be coded was 4871, then one would take 5×4 + 3×8 + 2×7 + 7×1 = 65, i.e. 65 modulo 10, and the check digit would be 5, giving 48715.

Systems with weights of 1, 3, 7, or 9, with the weights on neighboring numbers being different, are widely used: for example, 31 31 weights in UPC codes, 13 13 weights in EAN numbers (GS1 algorithm), and the 371 371 371 weights used in United States bank routing transit numbers. This system detects all single-digit errors and around 90% [ citation needed ]of transposition errors. 1, 3, 7, and 9 are used because they are coprime with 10, so changing any digit changes the check digit; using a coefficient that is divisible by 2 or 5 would lose information (because 5×0 = 5×2 = 5×4 = 5×6 = 5×8 = 0 modulo 10) and thus not catch some single-digit errors. Using different weights on neighboring numbers means that most transpositions change the check digit; however, because all weights differ by an even number, this does not catch transpositions of two digits that differ by 5 (0 and 5, 1 and 6, 2 and 7, 3 and 8, 4 and 9), since the 2 and 5 multiply to yield 10.

The ISBN-10 code instead uses modulo 11, which is prime, and all the number positions have different weights 1, 2, ... 10. This system thus detects all single-digit substitution and transposition errors (including jump transpositions), but at the cost of the check digit possibly being 10, represented by "X". (An alternative is simply to avoid using the serial numbers which result in an "X" check digit.) ISBN-13 instead uses the GS1 algorithm used in EAN numbers.

More complicated algorithms include the Luhn algorithm (1954), which captures 98% of single-digit transposition errors (it does not detect 90 ↔ 09) and the still more sophisticated Verhoeff algorithm (1969), which catches all single-digit substitution and transposition errors, and many (but not all) more complex errors. Similar is another abstract algebra-based method, the Damm algorithm (2004), that too detects all single-digit errors and all adjacent transposition errors. These three methods use a single check digit and will therefore fail to capture around 10%[ citation needed ] of more complex errors. To reduce this failure rate, it is necessary to use more than one check digit (for example, the modulo 97 check referred to below, which uses two check digits—for the algorithm, see International Bank Account Number) and/or to use a wider range of characters in the check digit, for example letters plus numbers.

Examples

UPC, EAN, GLN, GTIN, numbers administered by GS1

The final digit of a Universal Product Code, International Article Number, Global Location Number or Global Trade Item Number is a check digit computed as follows: [3] [4]

  1. Add the digits in the odd-numbered positions from the left (first, third, fifth, etc.—not including the check digit) together and multiply by three.
  2. Add the digits (up to but not including the check digit) in the even-numbered positions (second, fourth, sixth, etc.) to the result.
  3. Take the remainder of the result divided by 10 (i.e. the modulo 10 operation). If the remainder is equal to 0 then use 0 as the check digit, and if not 0 subtract the remainder from 10 to derive the check digit.

A GS1 check digit calculator and detailed documentation is online at GS1's website. [5] Another official calculator page shows that the mechanism for GTIN-13 is the same for Global Location Number/GLN. [6]

For instance, the UPC-A barcode for a box of tissues is "036000241457". The last digit is the check digit "7", and if the other numbers are correct then the check digit calculation must produce 7.

  1. Add the odd number digits: 0+6+0+2+1+5 = 14.
  2. Multiply the result by 3: 14 × 3 = 42.
  3. Add the even number digits: 3+0+0+4+4 = 11.
  4. Add the two results together: 42 + 11 = 53.
  5. To calculate the check digit, take the remainder of (53 / 10), which is also known as (53 modulo 10), and if not 0, subtract from 10. Therefore, the check digit value is 7. i.e. (53 / 10) = 5 remainder 3; 10 - 3 = 7.

Another example: to calculate the check digit for the following food item "01010101010x".

  1. Add the odd number digits: 0+0+0+0+0+0 = 0.
  2. Multiply the result by 3: 0 x 3 = 0.
  3. Add the even number digits: 1+1+1+1+1=5.
  4. Add the two results together: 0 + 5 = 5.
  5. To calculate the check digit, take the remainder of (5 / 10), which is also known as (5 modulo 10), and if not 0, subtract from 10: i.e. (5 / 10) = 0 remainder 5; (10 - 5) = 5. Therefore, the check digit x value is 5.

ISBN 10

The final character of a ten-digit International Standard Book Number is a check digit computed so that multiplying each digit by its position in the number (counting from the right) and taking the sum of these products modulo 11 is 0. The digit the farthest to the right (which is multiplied by 1) is the check digit, chosen to make the sum correct. It may need to have the value 10, which is represented as the letter X. For example, take the ISBN   0-201-53082-1: The sum of products is 0×10 + 2×9 + 0×8 + 1×7 + 5×6 + 3×5 + 0×4 + 8×3 + 2×2 + 1×1 = 99 ≡ 0 (mod 11). So the ISBN is valid. Positions can also be counted from left, in which case the check digit is multiplied by 10, to check validity: 0×1 + 2×2 + 0×3 + 1×4 + 5×5 + 3×6 + 0×7 + 8×8 + 2×9 + 1×10 = 143 ≡ 0 (mod 11).

ISBN 13

ISBN 13 (in use January 2007) is equal to the EAN-13 code found underneath a book's barcode. Its check digit is generated the same way as the UPC. [7]

NCDA

The NOID Check Digit Algorithm (NCDA), [8] in use since 2004, is designed for application in persistent identifiers and works with variable length strings of letters and digits, called extended digits. It is widely used with the ARK identifier scheme and somewhat used with schemes, such as the Handle System and DOI. An extended digit is constrained to betanumeric characters, which are alphanumerics minus vowels and the letter 'l' (ell). This restriction helps when generating opaque strings that are unlikely to form words by accident and will not contain both O and 0, or l and 1. Having a prime radix of R=29, the betanumeric repertoire permits the algorithm to guarantee detection of single-character and transposition errors [9] for strings less than R=29 characters in length (beyond which it provides a slightly weaker check). The algorithm generalizes to any character repertoire with a prime radix R and strings less than R characters in length.

Other examples of check digits

International

In the US

In Central America

  • The Guatemalan Tax Number (NIT – Número de Identificación Tributaria) based on modulo 11.

In Eurasia

In Oceania

Algorithms

Notable algorithms include:

See also

Related Research Articles

<span class="mw-page-title-main">ISBN</span> Unique numeric book identifier since 1970

The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.

<span class="mw-page-title-main">International Bank Account Number</span> Alphanumeric code that uniquely identifies a bank account in any participating country

The International Bank Account Number (IBAN) is an internationally agreed upon system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors. An IBAN uniquely identifies the account of a customer at a financial institution. It was originally adopted by the European Committee for Banking Standards (ECBS) and since 1997 as the international standard ISO 13616 under the International Organization for Standardization (ISO). The current version is ISO 13616:2020, which indicates the Society for Worldwide Interbank Financial Telecommunication (SWIFT) as the formal registrar. Initially developed to facilitate payments within the European Union, it has been implemented by most European countries and numerous countries in other parts of the world, mainly in the Middle East and the Caribbean. By July 2024, 88 countries were using the IBAN numbering system.

<span class="mw-page-title-main">Universal Product Code</span> Barcode symbology used for tracking trade items in stores

The Universal Product Code is a barcode symbology that is used worldwide for tracking trade items in stores.

<span class="mw-page-title-main">ISSN</span> Serial number used to identify a periodical publication

An International Standard Serial Number (ISSN) is an eight-digit serial number used to uniquely identify a periodical publication (periodical), such as a magazine. The ISSN is especially helpful in distinguishing between serials with the same title. ISSNs are used in ordering, cataloging, interlibrary loans, and other practices in connection with serial literature.

An International Securities Identification Number (ISIN) is a code that uniquely identifies a security globally for the purposes of facilitating clearing, reporting and settlement of trades. Its structure is defined in ISO 6166. The ISIN code is a 12-character alphanumeric code that serves for uniform identification of a security through normalization of the assigned National Number, where one exists, at trading and settlement.

The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple check digit formula used to validate a variety of identification numbers. It is described in US patent 2950048A, granted on 23 August 1960.

SEDOL stands for Stock Exchange Daily Official List, a list of security identifiers used in the United Kingdom and Ireland for clearing purposes. The numbers are assigned by the London Stock Exchange, on request by the security issuer. SEDOLs serve as the National Securities Identifying Number for all securities issued in the United Kingdom and are therefore part of the security's International Securities Identification Number (ISIN) as well. The SEDOL Masterfile (SMF) provides reference data on millions of global multi-asset securities each uniquely identified at the market level using a universal SEDOL code.

<span class="mw-page-title-main">Code 128</span> Barcode format

Code 128 is a high-density linear barcode symbology defined in ISO/IEC 15417:2007. It is used for alphanumeric or numeric-only barcodes. It can encode all 128 characters of ASCII and, by use of an extension symbol (FNC4), the Latin-1 characters defined in ISO/IEC 8859-1. It generally results in more compact barcodes compared to other methods like Code 39, especially when the texts contain mostly digits. Code 128 was developed by the Computer Identics Corporation in 1981.

<span class="mw-page-title-main">Interleaved 2 of 5</span> Type of barcode

Interleaved 2 of 5 (ITF) is a continuous two-width barcode symbology encoding digits. It is used commercially on 135 film, for ITF-14 barcodes, and on cartons of some products, while the products inside are labeled with UPC or EAN. ITF was created by David Allais, who also invented barcodes Code 39, Code 11, Code 93, and Code 49.

<span class="mw-page-title-main">ISO 6346</span> International standard covering the coding, identification and marking of shipping containers

ISO 6346 is an international standard covering the coding, identification and marking of intermodal (shipping) containers used within containerized intermodal freight transport by the International Organization for Standardization (ISO). The standard establishes a visual identification system for every container that includes a unique serial number, the owner, a country code, a size, type and equipment category as well as any operational marks. The register of container owners is managed by the International Container Bureau (BIC).

The Global Trade Item Number (GTIN) is an identifier for trade items, developed by the international organization GS1. Such identifiers are used to look up product information in a database which may belong to a retailer, manufacturer, collector, researcher, or other entity. The uniqueness and universality of the identifier is useful in establishing which product in one database corresponds to which product in another database, especially across organizational boundaries.

<span class="mw-page-title-main">GS1</span> Organization for barcode standards

GS1 is a not-for-profit, international organization developing and maintaining its own standards for barcodes and the corresponding issue company prefixes. The best known of these standards is the barcode, a symbol printed on products that can be scanned electronically.

<span class="mw-page-title-main">International Article Number</span> Standard barcode system used in global trade

The International Article Number is a standard describing a barcode symbology and numbering system used in global trade to identify a specific retail product type, in a specific packaging configuration, from a specific manufacturer. The standard has been subsumed in the Global Trade Item Number standard from the GS1 organization; the same numbers can be referred to as GTINs and can be encoded in other barcode symbologies, defined by GS1. EAN barcodes are used worldwide for lookup at retail point of sale, but can also be used as numbers for other purposes such as wholesale ordering or accounting. These barcodes only represent the digits 0–9, unlike some other barcode symbologies which can represent additional characters.

"Bookland" is the informal name for the Unique Country Code (UCC) prefix allocated in the 1980s for European Article Number (EAN) identifiers of published books, regardless of country of origin, so that the EAN namespace can catalogue books by ISBN rather than maintaining a redundant parallel numbering system. In other words, Bookland is a fictitious country that exists solely in EAN for the purposes of non-geographically cataloguing books in the otherwise geographically keyed EAN coding system.

The unified civil number is a 10-digit unique number assigned to each Bulgarian citizen. It serves as a national identification number. An EGN is assigned to Bulgarians at birth, or when a birth certificate is issued. The uniform system for civil registration and administrative service of population regulates the EGN system.

The Luhn mod N algorithm is an extension to the Luhn algorithm that allows it to work with sequences of values in any even-numbered base. This can be useful when a check digit is required to validate an identification string composed of letters, a combination of letters and digits or any arbitrary set of N characters where N is divisible by 2.

<span class="mw-page-title-main">MSI Barcode</span> Barcode symbology

MSI is a barcode symbology developed by the MSI Data Corporation, based on the original Plessey Code symbology. It is a continuous symbology that is not self-checking. MSI is used primarily for inventory control, marking storage containers and shelves in warehouse environments.

<span class="mw-page-title-main">EAN-8</span> EAN/UPC symbology barcode

An EAN-8 is an EAN/UPC symbology barcode and is derived from the longer International Article Number (EAN-13) code. It was introduced for use on small packages where an EAN-13 barcode would be too large; for example on cigarettes, pencils, and chewing gum packets. It is encoded identically to the 12 digits of the UPC-A barcode, except that it has 4 digits in each of the left and right halves.

The Global Location Number (GLN) is part of the GS1 systems of standards. It is a simple tool used to identify a location and can identify locations uniquely where required. This identifier is compliant with norm ISO/IEC 6523.

<span class="mw-page-title-main">GS1 DataBar Coupon</span>

The GS1 Databar Coupon code has been in use in retail industry since the mid-1980s. At first, it was a UPC with system ID 5. Since UPCs cannot hold more than 12 digits, it required another barcode to hold additional information like offer code, expiration date and household ID numbers. Therefore, the code was often extended with an additional UCC/EAN 128 barcode. EAN 13 was sometimes used instead of UPC, and because it starts with 99, it was called the EAN 99 coupon barcode, and subsequently GS1 DataBar. After more than 20 years in use, there is now a need to encode more data for complex coupons, and to accommodate longer company IDs, so the traditional coupon code has become less efficient and sometimes not usable at all.

References

  1. "What is Check Digit? - Definition from Techopedia". Techopedia.com. 20 July 2016. Retrieved 2022-03-16.
  2. Kirtland, Joseph (2001). Identification Numbers and Check Digit Schemes. Classroom Resource Materials. Mathematical Association of America. pp. 4–6. ISBN   978-0-88385-720-5.
  3. "GS1 Check Digit Calculator". GS1 US. 2006. Archived from the original on 2008-05-09. Retrieved 2008-05-21.
  4. "How to calculate a check digit manually". GS1. 5 November 2024.
  5. "Check Digit Calculator". GS1. 2005. Retrieved 2008-05-21.
  6. "Check Digit Calculator, at GS1 US official site". GS1 US. Archived from the original on 2016-11-21. Retrieved 2012-08-09.
  7. "ISBN Users Manual". International ISBN Agency. 2005. Archived from the original on 2014-04-29. Retrieved 2008-05-21.
  8. Kunze, John A. "noid - Nice Opaque Identifier Generator commands". metacpan.org. Archived from the original on 2022-05-22. Retrieved 2022-10-15.
  9. David Bressoud, Stan Wagon, "Computational Number Theory", 2000, Key College Publishing
  10. "OpenFIGI: Unlock the Power of Efficiency with Open Symbology". OpenFIGI. Archived from the original on 2022-08-09. Retrieved 2022-10-15.
  11. "Unique Identification Card" (PDF). Geek Gazette. Autumn 2011. p. 16. Archived from the original (PDF) on 2014-06-26.
  12. Chong-Yee Khoo (20 January 2014). "New Format for Singapore IP Application Numbers at IPOS". Singapore Patent Blog. Cantab IP. Archived from the original on 14 July 2014. Retrieved 6 July 2014.