Aztec Code

Last updated

Encoding: "This is an example Aztec symbol for Wikipedia." Azteccodeexample.svg
Encoding: "This is an example Aztec symbol for Wikipedia."

The Aztec Code is a matrix code invented by Andrew Longacre, Jr. and Robert Hussey in 1995. [1] The code was published by AIM, Inc. in 1997. Although the Aztec Code was patented, that patent was officially made public domain. [2] The Aztec Code is also published as ISO/IEC 24778:2024 standard. Named after the resemblance of the central finder pattern to an Aztec pyramid, Aztec Code has the potential to use less space than other matrix barcodes because it does not require a surrounding blank "quiet zone".

Contents

Structure

The symbol is built on a square grid with a bull's-eye pattern at its centre for locating the code. Data is encoded in concentric square rings around the bull's-eye pattern. The central bull's-eye is 9×9 or 13×13 pixels, and one row of pixels around that encodes basic coding parameters, producing a "core" of 11×11 or 15×15 squares. Data is added in "layers", each one containing two rings of pixels, giving total sizes of 15×15, 19×19, 23×23, etc.

The corners of the core include orientation marks, allowing the code to be read if rotated or reflected. Decoding begins at the corner with three black pixels, and proceeds clockwise to the corners with two, one, and zero black pixels. The variable pixels in the central core encode the size, so it is not necessary to mark the boundary of the code with a blank "quiet zone", although some barcode readers require one.

The compact Aztec code core may be surrounded by 1 to 4 layers, producing symbols from 15×15 (room for 13 digits or 12 letters) through 27×27. There is additionally a special 11×11 "rune" that encodes one byte of information. The full core supports up to 32 layers, 151×151 pixels, which can encode 3832 digits, 3067 letters, or 1914 bytes of data.

Whatever part of the symbol is not used for the basic data is used for Reed–Solomon error correction, and the split is completely configurable, between limits of 1 data word, and 3 check words. The recommended number of check words is 23% of symbol capacity plus 3 codewords. [3]

Aztec Code is supposed to produce readable codes with various printer technologies. It is also well suited for displays of cell phones and other mobile devices.

Encoding

The encoding process consists of the following steps:

  1. Converting the source message to a string of bits
  2. Computing the necessary symbol size and mode message, which determines the Reed–Solomon codeword size
  3. Bit-stuffing the message into Reed–Solomon codewords
  4. Padding the message to a codeword boundary
  5. Appending check codewords
  6. Arranging the complete message in a spiral around the core

All conversion between bits strings and other forms is performed according to the big-endian (most significant bit first) convention.

Character set

All 8-bit values can be encoded, plus two escape codes:

By default, codes 0–127 are interpreted according to ANSI X3.4 (ASCII), and 128–255 are interpreted according to ISO/IEC 8859-1: Latin Alphabet No. 1. This corresponds to ECI 000003.

Bytes are translated into 4- and 5-bit codes, based on a current decoding mode, with shift and latch codes for changing modes. Byte values not available this way may be encoded using a general "binary shift" code, which is followed by a length and a number of 8-bit codes.

For changing modes, a shift affects only the interpretation of the single following code, while a latch affects all following codes. Most modes use 5-bit codes, but Digit mode uses 4-bit codes.

Aztec code character encoding
CodeModeCodeMode
UpperLowerMixedPunctDigitUpperLowerMixedPunct
0P/SP/SP/SFLG(n)P/S16Oo^\+
1SPSPSPCRSP17Pp^],
2Aa^ACR LF018Qq^^-
3Bb^B. SP119Rr^_.
4Cc^C, SP220Ss@/
5Dd^D : SP321Tt\ :
6Ee^E !422Uu^ ;
7Ff^F"523Vv_<
8Gg^G#624Ww`=
9Hh^H$725Xx|>
10Ii^I %826Yy~ ?
11Jj^J&927Zz^?[
12Kk^K',28L/LU/SL/L]
13Ll^L(.29M/LM/LU/L{
14Mm^M)U/L30D/LD/LP/L}
15Nn^[*U/S31B/SB/SB/SU/L

B/S (binary shift) is followed by a 5-bit length. If non-zero, this indicates that 1–31 8-bit bytes follow. If zero, 11 additional length bits encode the number of following bytes less 31. (Note that for 32–62 bytes, two 5-bit byte shift sequences are more compact than one 11-bit.) At the end of the binary sequence, the previous mode is resumed.

FLG(n) is followed by a 3-bit n value. n=0 encodes FNC1. n=1–6 is followed by 1–6 digits (in digit mode) which are zero-padded to make a 6-bit ECI identifier. n=7 is reserved and currently illegal.

Mode message

The mode message encodes the number of layers (L layers encoded as the integer L−1), and the number of data codewords (D codewords, encoded as the integer D−1) in the message. All remaining codewords are used as check codewords.

For compact Aztec codes, the number of layers is encoded as a 2-bit value, and the number of data codewords as a 6-bit value, resulting in an 8-bit mode word. For full Aztec codes, the number of layers is encoded in 5 bits, and the number of data codewords is encoded in 11 bits, making a 16-bit mode word.

The mode word is broken into two or four 4-bit codewords in GF(16), and 5 or 6 Reed–Solomon check words are appended, making a 28- or 40-bit mode message, which is wrapped in a 1-pixel layer around the core. Thus a (15,10) or (15,9) Reed-Solomon code (shortened to (7,2) or (10,4) respectively), over GF(16) is used.

Because an L+1-layer compact Aztec code can hold more data than an L-layer full code, full codes with less than 4 layers are rarely used.

Most importantly, the number of layers determines the size of the Reed–Solomon codewords used. This varies from 6 to 12 bits:

Aztec code finite field polynomials
BitsFieldPrimitive polynomialGenerator polynomial (decimal coefficients)Used for
4GF(16)x4+x+1x5+11x4+4x3+6x2+2x+1 (Compact code)
x6+7x5+9x4+3x3+12x2+10x+12 (Full code)
Mode message
6GF(64)x6+x+1depends on number of error correction words1–2 layers
8GF(256)x8+x5+x3+x2+1depends on number of error correction words3–8 layers
10GF(1024)x10+x3+1depends on number of error correction words9–22 layers
12GF(4096)x12+x6+x5+x3+1depends on number of error correction words23–32 layers

The codeword size b is the smallest even number which ensures that the total number of codewords in the symbol is less than the limit of 2b−1 which can be corrected by a Reed–Solomon code.

As mentioned above, it is recommended that at least 23% of the available codewords, plus 3, are reserved for correction, and a symbol size is chosen such that the message will fit into the available space.

Bit stuffing

The data bits are broken into codewords, with the first bit corresponding to the most significant coefficient. While doing this, code words of all-zero and all-ones are avoided by bit stuffing: if the first b−1 bits of a code word have the same value, an extra bit with the complementary value is inserted into the data stream. This insertion takes place whether or not the last bit of the code word would have had the same value or not.

Also, note that this only applies to strings of b−1 bits at the beginning of a code word. Longer strings of identical bits are permitted as long as they straddle a code word boundary.

When decoding, a code word of all zero or all one may be assumed to be an erasure, and corrected more efficiently than a general error.

This process makes the message longer, and the final number of data codewords recorded in the mode message is not known until it is complete. In rare cases, it may be necessary to jump to the next-largest symbol and begin the process all over again to maintain the minimum fraction of check words.

Padding

After bit stuffing, the data string is padded to the next codeword boundary by appending 1 bit. If this would result in a code word of all ones, the last bit is changed to zero (and will be ignored by the decoder as a bit-stuffing bit). On decoding, the padding bits may be decoded as shift and latch codes, but that will not affect the message content. The reader must accept and ignore a partial code at the end of the message, as long as it is all-ones.

Additionally, if the total number of data bits available in the symbol is not a multiple of the codeword size, the data string is prefixed with an appropriate number of 0 bits to occupy the extra space. These bits are not included in the check word computation.

Check codewords

Both the mode word, and the data, must have check words appended to fill out the available space. This is computed by appending K check words such that the entire message is a multiple of the Reed–Solomon polynomial (x−2)(x−4)...(x−2K).

Note that check words are not subject to bit stuffing, and may be all-zero or all-one. Thus, it is not possible to detect the erasure of a check word.

Laying out the message

9-layer (53x53) Aztec code with reference grid highlighted in red. Aztec-Code-With-Reference-Grid.png
9-layer (53×53) Aztec code with reference grid highlighted in red.

A full Aztec code symbol has, in addition to the core, a "reference grid" of alternating black and white pixels occupying every 16th row and column. A compact Aztec code does not contain this grid. These known pixels allow a reader to maintain alignment with the pixel grid over large symbols. For up to 4 layers (31×31 pixels), this consists only of single lines extending outward from the core, continuing the alternating pattern. Inside the 5th layer, however, additional rows and columns of alternating pixels are inserted ±16 pixels from the center, so the 5th layer is located ±17 and ±18 pixels from the center, and a 5-layer symbol is 37×37 pixels.

Likewise, additional reference grid rows and columns are inserted ±32 pixels from the center, making a 12-layer symbol 67×67 pixels. In this case, the 12th layer occupies rings ±31 and ±33 pixels from the center. The pattern continues indefinitely outward, with 15-pixel blocks of data separated by rows and columns of the reference grid.

One way to construct the symbol is to delete the reference grid entirely and begin with a 14×14-pixel core centered on a 2×2 pixel-white square. Then break it into 15×15 pixel blocks and insert the reference grid between them.

The mode message begins at the top-left corner of the core and wraps around it clockwise in a 1-bit thick layer. It begins with the most significant bit of the number of layers and ends with the check words. For a compact Aztec code, it is broken into four 7-bit pieces to leave room for the orientation marks. For a full Aztec code, it is broken into four 10-bit pieces, and those pieces are each divided in half by the reference grid.

In some cases, the total capacity of the matrix does not divide evenly by full code words. In such cases, the main message is padded with 0 bits in the beginning. These bits are not included in the check word calculation and should be skipped during decoding. The total matrix capacity for a full symbol can be calculated as (112+16*L)*L for a full Aztec code and (88+16*L)*L for a compact Aztec code, where L is the symbol size in layers. [4] As an example, the total matrix capacity of a compact Aztec code with 1 layer is 104 bits. Since code words are six bits, this gives 17 code words and two extra bits. Two zero bits are prepended to the message as padding and must be skipped during decoding.

The padded main message begins at the outer top-left of the entire symbol and spirals around it counterclockwise in a 2-bit thick layer, ending directly above the top-left corner of the core. This places the bit-stuffed data words, for which erasures can be detected, in the outermost layers of the symbol, which are most prone to erasures. The check words are stored closer to the core. The last check word ends just above the top left corner of the bull's eye.

With the core in its standard orientation, the first bit of the first data word is placed in the upper-left corner, with additional bits placed in a 2-bit-wide column left-to-right and top-to-bottom. This continues until 2 rows from the bottom of the symbol when the pattern rotates 90 degrees counterclockwise and continues in a 2-bit high row, bottom-to-top and left-to-right. After 4 equal-sized quarter layers, the spiral continues with the top-left corner of the next-inner layer, finally ending one pixel above the top-left corner of the core.

Finally, 1 bit are printed as black squares, and 0 bits are printed as white squares.

Usage

Online ticket by Deutsche Bahn. Note that the Aztec barcode in this sample ticket is not readable with a normal app because the center is different. OnlineTicket DeutscheBahn 200dpi.png
Online ticket by Deutsche Bahn. Note that the Aztec barcode in this sample ticket is not readable with a normal app because the center is different.

Transport

Aztec codes are widely used for transport ticketing.

The Aztec Code has been selected by the airline industry (IATA's BCBP standard) for electronic boarding passes. Several airlines send Aztec Codes to passengers' mobile phones to act as boarding passes. These are often integrated with apps on passengers' phones, including Apple Wallet.

Aztec codes are also used in rail, including by Tehran Metro, British National Rail, [5] Eurostar, Deutsche Bahn, TCDD Taşımacılık, DSB, SJ, České dráhy, Slovak Railways, Slovenian Railways, Croatian Railways, Trenitalia, Nederlandse Spoorwegen, Pasažieru vilciens, PKP Intercity, VR Group, Via Rail, Swiss Federal Railways, SNCB and SNCF for tickets sold online and printed out by customers or displayed on mobile phone screens. The Aztec code is scanned by a handheld scanner by on-train staff or at the turnstile to validate the ticket.

Governmental

Car registration documents in Poland bear a summary, compressed by NRV2E algorithm, encoded as Aztec Code. Works are underway to enable car insurance companies to automatically fill in the relevant information based on digital photographs of the document as the first step of closing a new insurance contract.

Federal Tax Service in Russia encodes payment information in tax notices as Aztec Code.

Commercial

Many bills in Canada are now using this technology as well, including EastLink, Shaw Cable, and Bell Aliant.

See also

Related Research Articles

In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium. An early example is an invention of language, which enabled a person, through speech, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of writing, which converted spoken language into visual symbols, extended the range of communication across space and time.

<span class="mw-page-title-main">Error detection and correction</span> Techniques that enable reliable delivery of digital data over unreliable communication channels

In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

<span class="mw-page-title-main">GIF</span> Bitmap image file format family

The Graphics Interchange Format is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on June 15, 1987.

<span class="mw-page-title-main">Hamming code</span> Family of linear error-correcting codes

In computer science and telecommunications, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data.

In information theory and coding theory, Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, including consumer technologies such as MiniDiscs, CDs, DVDs, Blu-ray discs, QR codes, Data Matrix, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix of any other code word in the system. It is trivially true for fixed-length codes, so only a point of consideration for variable-length codes.

<span class="mw-page-title-main">Coding theory</span> Study of the properties of codes and their fitness

Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.

Low-density parity-check (LDPC) codes are a class of error correction codes which have gained prominence in coding theory and information theory since the late 1990s. The codes today are widely used in applications ranging from wireless communications to flash-memory storage. Together with turbo codes, they sparked a revolution in coding theory, achieving order-of-magnitude improvements in performance compared to traditional error correction codes.

<span class="mw-page-title-main">PDF417</span> Type of barcode

PDF417 is a stacked linear barcode format used in a variety of applications such as transport, identification cards, and inventory management. "PDF" stands for Portable Data File. The "417" signifies that each pattern in the code consists of 4 bars and spaces in a pattern that is 17 units (modules) long. The PDF417 symbology was invented by Dr. Ynjiun P. Wang at Symbol Technologies in 1991. It is defined in ISO 15438.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

Eight-to-fourteen modulation (EFM) is a data encoding technique – formally, a line code – used by compact discs (CD), laserdiscs (LD) and pre-Hi-MD MiniDiscs. EFMPlus is a related code, used in DVDs and Super Audio CDs (SACDs).

<span class="mw-page-title-main">Data Matrix</span> Two-dimensional matrix barcode

A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is from a few bytes up to 1556 bytes. The length of the encoded data depends on the number of cells in the matrix. Error correction codes are often used to increase reliability: even if one or more cells are damaged so it is unreadable, the message can still be read. A Data Matrix symbol can store up to 2,335 alphanumeric characters.

In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.

In computer science and information theory, a canonical Huffman code is a particular type of Huffman code with unique properties which allow it to be described in a very compact manner. Rather than storing the structure of the code tree explicitly, canonical Huffman codes are ordered in such a way that it suffices to only store the lengths of the codewords, which reduces the overhead of the codebook.

In coding theory, a variable-length code is a code which maps source symbols to a variable number of bits. The equivalent concept in computer science is bit string.

In coding theory, concatenated codes form a class of error-correcting codes that are derived by combining an inner code and an outer code. They were conceived in 1966 by Dave Forney as a solution to the problem of finding a code that has both exponentially decreasing error probability with increasing block length and polynomial-time decoding complexity. Concatenated codes became widely used in space communications in the 1970s.

Consistent Overhead Byte Stuffing (COBS) is an algorithm for encoding data bytes that results in efficient, reliable, unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. It employs a particular byte value, typically zero, to serve as a packet delimiter. When zero is used as a delimiter, the algorithm replaces each zero data byte with a non-zero value so that no zero data bytes will appear in the packet and thus be misinterpreted as packet boundaries.

<span class="mw-page-title-main">Han Xin code</span> Type of matrix barcode

Han Xin code is two-dimensional (2D) matrix barcode symbology invented in 2007 by Chinese company The Article Numbering Center of China to break monopoly of QR code. As QR code, Han Xin code consists of black squares and white square spaces arranged in a square grid on a white background. It has four finder patterns and other markers which allow to recognize it with camera-based readers. Han Xin code contains Reed–Solomon error correction with ability to read corrupted images. At this time, it is issued as ISO/IEC 20830:2021.

<span class="mw-page-title-main">DotCode</span> Type of matrix barcode

DotCode is two-dimensional (2D) matrix barcode invented in 2008 by Hand Held Products company to replace outdated Code 128. At this time, it is issued by Association for Automatic Identification and Mobility (AIM) as “ISS DotCode Symbology Specification 4.0”. DotCode consists of sparse black round dots and white spaces on white background. In case of a black background the dots can be white. DotCode was developed to use with high-speed industrial printers where printing accuracy can be low. Because DotCode by the standard does not require complicated elements like continuous lines or special shapes it can be applied with laser engraving or industrial drills.

<span class="mw-page-title-main">Rectangular Micro QR Code</span> Type of matrix barcode

Rectangular Micro QR Code is two-dimensional (2D) matrix barcode invented and standardized in 2022 by Denso Wave as ISO/IEC 23941. rMQR Code is designed as a rectangular variation of QR code and has the same parameters and applications as original QR code. But rMQR Code is more suitable for the rectangular areas and has difference between width and height up to 19 in R7x139 version. In this way it can be used in places where 1D barcodes are used. rMQR Code can replace Code 128 and Code 39 barcodes with more effective data encoding.

References

    • US 5591956,Longacre, Jr., Andrew&Hussey, Robert,"Two Dimensional Data Encoding Structure and Symbology for use with Optical Readers",published 1997-01-07
  1. Official Gazette. United States Patent Office. 17 June 1997. Hereby dedicates to the public the entire term of said patent. Click "images" then "correction" to see the dedication to the public domain.
  2. Adams, Russ. "2-Dimensional Bar Code Page". Archived from the original on 30 April 2010. Retrieved 14 July 2022.
  3. "Спецификация Aztec Code (без Small Aztec)" [Aztec Code Specification (without Small Aztec)] (in Russian). Archived from the original on 25 February 2020.
  4. "Reversing UK mobile rail tickets". eta.st. 31 January 2023. Retrieved 5 February 2023.