Luhn mod N algorithm

Last updated

The Luhn mod N algorithm is an extension to the Luhn algorithm (also known as mod 10 algorithm) that allows it to work with sequences of values in any even-numbered base. This can be useful when a check digit is required to validate an identification string composed of letters, a combination of letters and digits or any arbitrary set of N characters where N is divisible by 2.

Contents

Informal explanation

The Luhn mod N algorithm generates a check digit (more precisely, a check character) within the same range of valid characters as the input string. For example, if the algorithm is applied to a string of lower-case letters (a to z), the check character will also be a lower-case letter. Apart from this distinction, it resembles very closely the original algorithm.

The main idea behind the extension is that the full set of valid input characters is mapped to a list of code-points (i.e., sequential integers beginning with zero). The algorithm processes the input string by converting each character to its associated code-point and then performing the computations in mod N (where N is the number of valid input characters). Finally, the resulting check code-point is mapped back to obtain its corresponding check character.

Limitation

The Luhn mod N algorithm only works where N is divisible by 2. This is because there is an operation to correct the value of a position after doubling its value which does not work where N is not divisible by 2. For applications using the English alphabet this is not a problem, since a string of lower-case letters has 26 code-points, and adding Decimal characters adds a further 10, maintaining an N divisible by 2.

Explanation

The second step in the Luhn algorithm re-packs the doubled value of a position into the original digit's base by adding together the individual digits in the doubled value when written in base N. This step results in even numbers if the doubled value is less than or equal to N, and odd numbers if the doubled value is greater than N. For example, in Decimal applications where N is 10, original values between 0 and 4 result in even numbers and original values between 5 and 9 result in odd numbers, effectively re-packing the doubled values between 0 and 18 into a single distinct result between 0 and 9.

Where an N is used that is not divisible by 2 this step returns even numbers for doubled values greater than N which cannot be distinguished from doubled values less than or equal to N.

Outcome

The algorithm will neither detect all single-digit errors nor all transpositions of adjacent digits if an N is used that is not divisible by 2. As these detection capabilities are the algorithm's primary strengths, the algorithm is weakened almost entirely by this limitation. The Luhn mod N algorithm odd variation enables applications where N is not divisible by 2 by replacing the doubled value at each position with the remainder after dividing the position's value by N which gives odd number remainders consistent with the original algorithm design.

Mapping characters to code-points

Initially, a mapping between valid input characters and code-points must be created. For example, consider that the valid characters are the lower-case letters from a to f. Therefore, a suitable mapping would be:

Characterabcdef
Code-point012345

Note that the order of the characters is completely irrelevant. This other mapping would also be acceptable (although possibly more cumbersome to implement):

Characterceafbd
Code-point012345

It is also possible to intermix letters and digits (and possibly even other characters). For example, this mapping would be appropriate for lower-case hexadecimal digits:

Character0123456789abcdef
Code-point0123456789101112131415

Algorithm in C#

Assuming the following functions are defined:

/// <summary>/// This can be any string of characters./// </summary>privateconststringCodePoints="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";privateintNumberOfValidInputCharacters()=>CodePoints.Length;privateintCodePointFromCharacter(charcharacter)=>CodePoints.IndexOf(character);privatecharCharacterFromCodePoint(intcodePoint)=>CodePoints[codePoint];

The function to generate a check character is:

charGenerateCheckCharacter(stringinput){intfactor=2;intsum=0;intn=NumberOfValidInputCharacters();// Starting from the right and working leftwards is easier since// the initial "factor" will always be "2".for(inti=input.Length-1;i>=0;i--){intcodePoint=CodePointFromCharacter(input[i]);intaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=IntegerValue(addend/n)+(addend%n);sum+=addend;}// Calculate the number that must be added to the "sum"// to make it divisible by "n".intremainder=sum%n;intcheckCodePoint=(n-remainder)%n;returnCharacterFromCodePoint(checkCodePoint);}

And the function to validate a string (with the check character as the last character) is:

boolValidateCheckCharacter(stringinput){intfactor=1;intsum=0;intn=NumberOfValidInputCharacters();// Starting from the right, work leftwards// Now, the initial "factor" will always be "1"// since the last character is the check character.for(inti=input.Length-1;i>=0;i--){intcodePoint=CodePointFromCharacter(input[i]);intaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=IntegerValue(addend/n)+(addend%n);sum+=addend;}intremainder=sum%n;return(remainder==0);}

Algorithm in Java

Assuming the following functions are defined:

intcodePointFromCharacter(charcharacter){...}charcharacterFromCodePoint(intcodePoint){...}intnumberOfValidInputCharacters(){...}

The function to generate a check character is:

chargenerateCheckCharacter(Stringinput){intfactor=2;intsum=0;intn=numberOfValidInputCharacters();// Starting from the right and working leftwards is easier since// the initial "factor" will always be "2".for(inti=input.length()-1;i>=0;i--){intcodePoint=codePointFromCharacter(input.charAt(i));intaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=(addend/n)+(addend%n);sum+=addend;}// Calculate the number that must be added to the "sum"// to make it divisible by "n".intremainder=sum%n;intcheckCodePoint=(n-remainder)%n;returncharacterFromCodePoint(checkCodePoint);}

And the function to validate a string (with the check character as the last character) is:

booleanvalidateCheckCharacter(Stringinput){intfactor=1;intsum=0;intn=numberOfValidInputCharacters();// Starting from the right, work leftwards// Now, the initial "factor" will always be "1"// since the last character is the check character.for(inti=input.length()-1;i>=0;i--){intcodePoint=codePointFromCharacter(input.charAt(i));intaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=(addend/n)+(addend%n);sum+=addend;}intremainder=sum%n;return(remainder==0);}

Algorithm in JavaScript

Assuming the following functions are defined:

constcodePoints="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";//This can be any string of permitted charactersfunctionnumberOfValidInputCharacters(){returncodePoints.length;}functioncodePointFromCharacter(character){returncodePoints.indexOf(character);}functioncharacterFromCodePoint(codePoint){returncodePoints.charAt(codePoint);}

The function to generate a check character is:

functiongenerateCheckCharacter(input){letfactor=2;letsum=0;letn=numberOfValidInputCharacters();// Starting from the right and working leftwards is easier since// the initial "factor" will always be "2".for(leti=input.length-1;i>=0;i--){letcodePoint=codePointFromCharacter(input.charAt(i));letaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=(Math.floor(addend/n))+(addend%n);sum+=addend;}// Calculate the number that must be added to the "sum"// to make it divisible by "n".letremainder=sum%n;letcheckCodePoint=(n-remainder)%n;returncharacterFromCodePoint(checkCodePoint);}

And the function to validate a string (with the check character as the last character) is:

functionvalidateCheckCharacter(input){letfactor=1;letsum=0;letn=numberOfValidInputCharacters();// Starting from the right, work leftwards// Now, the initial "factor" will always be "1"// since the last character is the check character.for(leti=input.length-1;i>=0;i--){letcodePoint=codePointFromCharacter(input.charAt(i));letaddend=factor*codePoint;// Alternate the "factor" that each "codePoint" is multiplied byfactor=(factor==2)?1:2;// Sum the digits of the "addend" as expressed in base "n"addend=(Math.floor(addend/n))+(addend%n);sum+=addend;}letremainder=sum%n;return(remainder==0);}

Example

Generation

Consider the above set of valid input characters and the example input string abcdef. To generate the check character, start with the last character in the string and move left doubling every other code-point. The "digits" of the code-points as written in base 6 (since there are 6 valid input characters) should then be summed up:

Characterabcdef
Code-point012345
Double26 (base 10)
10 (base 6)
10 (base 10)
14 (base 6)
Reduce0221 + 041 + 4
Sum of digits022145

The total sum of digits is 14 (0 + 2 + 2 + 1 + 4 + 5). The number that must be added to obtain the next multiple of 6 (in this case, 18) is 4. This is the resulting check code-point. The associated check character is e.

Validation

The resulting string abcdefe can then be validated by using a similar procedure:

Characterabcdefe
Code-point0123454
Double26 (base 10)
10 (base 6)
10 (base 10)
14 (base 6)
Reduce0221 + 041 + 44
Sum of digits0221454

The total sum of digits is 18. Since it is divisible by 6, the check character is valid.

Implementation

The mapping of characters to code-points and back can be implemented in a number of ways. The simplest approach (akin to the original Luhn algorithm) is to use ASCII code arithmetic. For example, given an input set of 0 to 9, the code-point can be calculated by subtracting the ASCII code for '0' from the ASCII code of the desired character. The reverse operation will provide the reverse mapping. Additional ranges of characters can be dealt with by using conditional statements.

Non-sequential sets can be mapped both ways using a hard-coded switch/case statement. A more flexible approach is to use something similar to an associative array. For this to work, a pair of arrays is required to provide the two-way mapping.

An additional possibility is to use an array of characters where the array indexes are the code-points associated with each character. The mapping from character to code-point can then be performed with a linear or binary search. In this case, the reverse mapping is just a simple array lookup.

Weakness

This extension shares the same weakness as the original algorithm, namely, it cannot detect the transposition of the sequence <first-valid-character><last-valid-character> to <last-valid-character><first-valid-character> (or vice versa). This is equivalent to the transposition of 09 to 90 (assuming a set of valid input characters from 0 to 9 in order). On a positive note, the larger the set of valid input characters, the smaller the impact of the weakness.

See also

Related Research Articles

<span class="mw-page-title-main">Checksum</span> Data used to detect errors in other data

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

An International Securities Identification Number (ISIN) is a code that uniquely identifies a security globally for the purposes of facilitating clearing, reporting and settlement of trades. Its structure is defined in ISO 6166. The ISIN code is a 12-character alphanumeric code that serves for uniform identification of a security through normalization of the assigned National Number, where one exists, at trading and settlement.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits computed by an algorithm from the other digits in the sequence input.

The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after its creator, IBM scientist Hans Peter Luhn, is a simple check digit formula used to validate a variety of identification numbers.

SEDOL stands for Stock Exchange Daily Official List, a list of security identifiers used in the United Kingdom and Ireland for clearing purposes. The numbers are assigned by the London Stock Exchange, on request by the security issuer. SEDOLs serve as the National Securities Identifying Number for all securities issued in the United Kingdom and are therefore part of the security's ISIN as well. The SEDOL Masterfile (SMF) provides reference data on millions of global multi-asset securities each uniquely identified at the market level using a universal SEDOL code.

<span class="mw-page-title-main">Code 128</span> Barcode format

Code 128 is a high-density linear barcode symbology defined in ISO/IEC 15417:2007. It is used for alphanumeric or numeric-only barcodes. It can encode all 128 characters of ASCII and, by use of an extension symbol (FNC4), the Latin-1 characters defined in ISO/IEC 8859-1.. It generally results in more compact barcodes compared to other methods like Code 39, especially when the texts contain mostly digits. Code 128 was developed by the Computer Identics Corporation in 1981.

In mathematics, the digit sum of a natural number in a given number base is the sum of all its digits. For example, the digit sum of the decimal number would be

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In number theory, a narcissistic number in a given number base is a number that is the sum of its own digits each raised to the power of the number of digits.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer programming, a semipredicate problem occurs when a subroutine intended to return a useful value can fail, but the signalling of failure uses an otherwise valid return value. The problem is that the caller of the subroutine cannot tell what the result means in this case.

A mobile equipment identifier (MEID) is a globally unique number identifying a physical piece of CDMA2000 mobile station equipment. The number format is defined by the 3GPP2 report S.R0048 but in practical terms, it can be seen as an IMEI but with hexadecimal digits.

<span class="mw-page-title-main">MSI Barcode</span> Barcode symbology

MSI is a barcode symbology developed by the MSI Data Corporation, based on the original Plessey Code symbology. It is a continuous symbology that is not self-checking. MSI is used primarily for inventory control, marking storage containers and shelves in warehouse environments.

A negative base may be used to construct a non-standard positional numeral system. Like other place-value systems, each position holds multiples of the appropriate power of the system's base; but that base is negative—that is to say, the base b is equal to −r for some natural number r.

Secure coding is the practice of developing computer software in such a way that guards against the accidental introduction of security vulnerabilities. Defects, bugs and logic flaws are consistently the primary cause of commonly exploited software vulnerabilities. Through the analysis of thousands of reported vulnerabilities, security professionals have discovered that most vulnerabilities stem from a relatively small number of common software programming errors. By identifying the insecure coding practices that lead to these errors and educating developers on secure alternatives, organizations can take proactive steps to help significantly reduce or eliminate vulnerabilities in software before deployment.

Escape sequences are used in the programming languages C and C++, and their design was copied in many other languages such as Java, PHP, C#, etc. An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.

SUPER BASIC, sometimes SBASIC for short, is an advanced dialect of the BASIC programming language offered on Tymshare's SDS 940 systems starting in 1968 and available well into the 1970s.