A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits (or letters) computed by an algorithm from the other digits (or letters) in the sequence input. [1]
With a check digit, one can detect simple errors in the input of a series of characters (usually digits) such as a single mistyped digit or some permutations of two successive digits.
Check digit algorithms are generally designed to capture human transcription errors. In order of complexity, these include the following: [2]
In choosing a system, a high probability of catching errors is traded off against implementation difficulty; simple check digit systems are easily understood and implemented by humans but do not catch as many errors as complex ones, which require sophisticated programs to implement.
A desirable feature is that left-padding with zeros should not change the check digit. This allows variable length numbers to be used and the length to be changed. If there is a single check digit added to the original number, the system will not always capture multiple errors, such as two replacement errors (12 → 34) though, typically, double errors will be caught 90% of the time (both changes would need to change the output by offsetting amounts).
A very simple check digit method would be to take the sum of all digits (digital sum) modulo 10. This would catch any single-digit error, as such an error would always change the sum, but does not catch any transposition errors (switching two digits) as re-ordering does not change the sum.
A slightly more complex method is to take the weighted sum of the digits, modulo 10, with different weights for each number position.
To illustrate this, for example if the weights for a four digit number were 5, 3, 2, 7 and the number to be coded was 4871, then one would take 5×4 + 3×8 + 2×7 + 7×1 = 65, i.e. 65 modulo 10, and the check digit would be 5, giving 48715.
Systems with weights of 1, 3, 7, or 9, with the weights on neighboring numbers being different, are widely used: for example, 31 31 weights in UPC codes, 13 13 weights in EAN numbers (GS1 algorithm), and the 371 371 371 weights used in United States bank routing transit numbers. This system detects all single-digit errors and around 90% [ citation needed ]of transposition errors. 1, 3, 7, and 9 are used because they are coprime with 10, so changing any digit changes the check digit; using a coefficient that is divisible by 2 or 5 would lose information (because 5×0 = 5×2 = 5×4 = 5×6 = 5×8 = 0 modulo 10) and thus not catch some single-digit errors. Using different weights on neighboring numbers means that most transpositions change the check digit; however, because all weights differ by an even number, this does not catch transpositions of two digits that differ by 5 (0 and 5, 1 and 6, 2 and 7, 3 and 8, 4 and 9), since the 2 and 5 multiply to yield 10.
The ISBN-10 code instead uses modulo 11, which is prime, and all the number positions have different weights 1, 2, ... 10. This system thus detects all single-digit substitution and transposition errors (including jump transpositions), but at the cost of the check digit possibly being 10, represented by "X". (An alternative is simply to avoid using the serial numbers which result in an "X" check digit.) ISBN-13 instead uses the GS1 algorithm used in EAN numbers.
More complicated algorithms include the Luhn algorithm (1954), which captures 98% of single-digit transposition errors (it does not detect 90 ↔ 09) and the still more sophisticated Verhoeff algorithm (1969), which catches all single-digit substitution and transposition errors, and many (but not all) more complex errors. Similar is another abstract algebra-based method, the Damm algorithm (2004), that too detects all single-digit errors and all adjacent transposition errors. These three methods use a single check digit and will therefore fail to capture around 10%[ citation needed ] of more complex errors. To reduce this failure rate, it is necessary to use more than one check digit (for example, the modulo 97 check referred to below, which uses two check digits—for the algorithm, see International Bank Account Number) and/or to use a wider range of characters in the check digit, for example letters plus numbers.
The final digit of a Universal Product Code, International Article Number, Global Location Number or Global Trade Item Number is a check digit computed as follows: [3] [4]
A GS1 check digit calculator and detailed documentation is online at GS1's website. [5] Another official calculator page shows that the mechanism for GTIN-13 is the same for Global Location Number/GLN. [6]
For instance, the UPC-A barcode for a box of tissues is "036000241457". The last digit is the check digit "7", and if the other numbers are correct then the check digit calculation must produce 7.
Another example: to calculate the check digit for the following food item "01010101010x".
The final character of a ten-digit International Standard Book Number is a check digit computed so that multiplying each digit by its position in the number (counting from the right) and taking the sum of these products modulo 11 is 0. The digit the farthest to the right (which is multiplied by 1) is the check digit, chosen to make the sum correct. It may need to have the value 10, which is represented as the letter X. For example, take the ISBN 0-201-53082-1: The sum of products is 0×10 + 2×9 + 0×8 + 1×7 + 5×6 + 3×5 + 0×4 + 8×3 + 2×2 + 1×1 = 99 ≡ 0 (mod 11). So the ISBN is valid. Positions can also be counted from left, in which case the check digit is multiplied by 10, to check validity: 0×1 + 2×2 + 0×3 + 1×4 + 5×5 + 3×6 + 0×7 + 8×8 + 2×9 + 1×10 = 143 ≡ 0 (mod 11).
ISBN 13 (in use January 2007) is equal to the EAN-13 code found underneath a book's barcode. Its check digit is generated the same way as the UPC. [7]
The NOID Check Digit Algorithm (NCDA), [8] in use since 2004, is designed for application in persistent identifiers and works with variable length strings of letters and digits, called extended digits. It is widely used with the ARK identifier scheme and somewhat used with schemes, such as the Handle System and DOI. An extended digit is constrained to betanumeric characters, which are alphanumerics minus vowels and the letter 'l' (ell). This restriction helps when generating opaque strings that are unlikely to form words by accident and will not contain both O and 0, or l and 1. Having a prime radix of R=29, the betanumeric repertoire permits the algorithm to guarantee detection of single-character and transposition errors [9] for strings less than R=29 characters in length (beyond which it provides a slightly weaker check). The algorithm generalizes to any character repertoire with a prime radix R and strings less than R characters in length.
Notable algorithms include:
The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.
The International Bank Account Number (IBAN) is an internationally agreed upon system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors. An IBAN uniquely identifies the account of a customer at a financial institution. It was originally adopted by the European Committee for Banking Standards (ECBS) and since 1997 as the international standard ISO 13616 under the International Organization for Standardization (ISO). The current version is ISO 13616:2020, which indicates the Society for Worldwide Interbank Financial Telecommunication (SWIFT) as the formal registrar. Initially developed to facilitate payments within the European Union, it has been implemented by most European countries and numerous countries in other parts of the world, mainly in the Middle East and the Caribbean. By July 2024, 88 countries were using the IBAN numbering system.
The Universal Product Code is a barcode symbology that is used worldwide for tracking trade items in stores.
An International Standard Serial Number (ISSN) is an eight-digit serial number used to uniquely identify a serial publication (periodical), such as a magazine. The ISSN is especially helpful in distinguishing between serials with the same title. ISSNs are used in ordering, cataloging, interlibrary loans, and other practices in connection with serial literature.
An International Securities Identification Number (ISIN) is a code that uniquely identifies a security globally for the purposes of facilitating clearing, reporting and settlement of trades. Its structure is defined in ISO 6166. The ISIN code is a 12-character alphanumeric code that serves for uniform identification of a security through normalization of the assigned National Number, where one exists, at trading and settlement.
The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple check digit formula used to validate a variety of identification numbers. It is described in US patent 2950048A, granted on 23 August 1960.
SEDOL stands for Stock Exchange Daily Official List, a list of security identifiers used in the United Kingdom and Ireland for clearing purposes. The numbers are assigned by the London Stock Exchange, on request by the security issuer. SEDOLs serve as the National Securities Identifying Number for all securities issued in the United Kingdom and are therefore part of the security's ISIN as well. The SEDOL Masterfile (SMF) provides reference data on millions of global multi-asset securities each uniquely identified at the market level using a universal SEDOL code.
Code 128 is a high-density linear barcode symbology defined in ISO/IEC 15417:2007. It is used for alphanumeric or numeric-only barcodes. It can encode all 128 characters of ASCII and, by use of an extension symbol (FNC4), the Latin-1 characters defined in ISO/IEC 8859-1. It generally results in more compact barcodes compared to other methods like Code 39, especially when the texts contain mostly digits. Code 128 was developed by the Computer Identics Corporation in 1981.
Interleaved 2 of 5 (ITF) is a continuous two-width barcode symbology encoding digits. It is used commercially on 135 film, for ITF-14 barcodes, and on cartons of some products, while the products inside are labeled with UPC or EAN. ITF was created by David Allais, who also invented barcodes Code 39, Code 11, Code 93, and Code 49.
ISO 6346 is an international standard covering the coding, identification and marking of intermodal (shipping) containers used within containerized intermodal freight transport by the International Organization for Standardization (ISO). The standard establishes a visual identification system for every container that includes a unique serial number, the owner, a country code, a size, type and equipment category as well as any operational marks. The register of container owners is managed by the International Container Bureau (BIC).
The Global Trade Item Number (GTIN) is an identifier for trade items, developed by the international organization GS1. Such identifiers are used to look up product information in a database which may belong to a retailer, manufacturer, collector, researcher, or other entity. The uniqueness and universality of the identifier is useful in establishing which product in one database corresponds to which product in another database, especially across organizational boundaries.
GS1 is a not-for-profit, international organization developing and maintaining its own standards for barcodes and the corresponding issue company prefixes. The best known of these standards is the barcode, a symbol printed on products that can be scanned electronically.
The International Article Number is a standard describing a barcode symbology and numbering system used in global trade to identify a specific retail product type, in a specific packaging configuration, from a specific manufacturer. The standard has been subsumed in the Global Trade Item Number standard from the GS1 organization; the same numbers can be referred to as GTINs and can be encoded in other barcode symbologies, defined by GS1. EAN barcodes are used worldwide for lookup at retail point of sale, but can also be used as numbers for other purposes such as wholesale ordering or accounting. These barcodes only represent the digits 0–9, unlike some other barcode symbologies which can represent additional characters.
"Bookland" is the informal name for the Unique Country Code (UCC) prefix allocated in the 1980s for European Article Number (EAN) identifiers of published books, regardless of country of origin, so that the EAN namespace can catalogue books by ISBN rather than maintaining a redundant parallel numbering system. In other words, Bookland is a fictitious country that exists solely in EAN for the purposes of non-geographically cataloguing books in the otherwise geographically keyed EAN coding system.
The unified civil number is a 10-digit unique number assigned to each Bulgarian citizen. It serves as a national identification number. An EGN is assigned to Bulgarians at birth, or when a birth certificate is issued. The uniform system for civil registration and administrative service of population regulates the EGN system.
The Luhn mod N algorithm is an extension to the Luhn algorithm that allows it to work with sequences of values in any even-numbered base. This can be useful when a check digit is required to validate an identification string composed of letters, a combination of letters and digits or any arbitrary set of N characters where N is divisible by 2.
MSI is a barcode symbology developed by the MSI Data Corporation, based on the original Plessey Code symbology. It is a continuous symbology that is not self-checking. MSI is used primarily for inventory control, marking storage containers and shelves in warehouse environments.
An EAN-8 is an EAN/UPC symbology barcode and is derived from the longer International Article Number (EAN-13) code. It was introduced for use on small packages where an EAN-13 barcode would be too large; for example on cigarettes, pencils, and chewing gum packets. It is encoded identically to the 12 digits of the UPC-A barcode, except that it has 4 digits in each of the left and right halves.
The Global Location Number (GLN) is part of the GS1 systems of standards. It is a simple tool used to identify a location and can identify locations uniquely where required. This identifier is compliant with norm ISO/IEC 6523.
The GS1 Databar Coupon code has been in use in retail industry since the mid-1980s. At first, it was a UPC with system ID 5. Since UPCs cannot hold more than 12 digits, it required another barcode to hold additional information like offer code, expiration date and household ID numbers. Therefore, the code was often extended with an additional UCC/EAN 128 barcode. EAN 13 was sometimes used instead of UPC, and because it starts with 99, it was called the EAN 99 coupon barcode, and subsequently GS1 DataBar. After more than 20 years in use, there is now a need to encode more data for complex coupons, and to accommodate longer company IDs, so the traditional coupon code has become less efficient and sometimes not usable at all.