Adler-32

Last updated August 26, 2024

Adler-32 is a checksum algorithm written by Mark Adler in 1995,^[1] modifying Fletcher's checksum. Compared to a cyclic redundancy check of the same length, it trades reliability for speed. Adler-32 is more reliable than Fletcher-16, and slightly less reliable than Fletcher-32.^[2]

History

The Adler-32 checksum is part of the widely used zlib compression library, as both were developed by Mark Adler. A "rolling checksum" version of Adler-32 is used in the rsync utility.

Calculation

An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the stream plus one, and B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo 65521 (the largest prime number smaller than 2¹⁶). The bytes are stored in network order (big endian), B occupying the two most significant bytes.

The function may be expressed as

A = 1 + D₁ + D₂ + ... + D_n (mod 65521)  B = (1 + D₁) + (1 + D₁ + D₂) + ... + (1 + D₁ + D₂ + ... + D_n) (mod 65521)   = n×D₁ + (n−1)×D₂ + (n−2)×D₃ + ... + D_n + n (mod 65521)  Adler-32(D) = B × 65536 + A

where D is the string of bytes for which the checksum is to be calculated, and n is the length of D.

Example

The Adler-32 sum of the ASCII string "Wikipedia" would be calculated as follows:

Character	ASCII code	A			B
(shown as base 10)
W	87	1 +	87 =	88	0 +	88 =	88
i	105	88 +	105 =	193	88 +	193 =	281
k	107	193 +	107 =	300	281 +	300 =	581
i	105	300 +	105 =	405	581 +	405 =	986
p	112	405 +	112 =	517	986 +	517 =	1503
e	101	517 +	101 =	618	1503 +	618 =	2121
d	100	618 +	100 =	718	2121 +	718 =	2839
i	105	718 +	105 =	823	2839 +	823 =	3662
a	97	823 +	97 =	920	3662 +	920 =	4582

A =  920 =  0x398  (base 16) B = 4582 = 0x11E6 Output = (0x11E6 << 16) + 0x398 = 0x11E60398 = 300286872

The modulo operation had no effect in this example, since none of the values reached 65521.

Comparison with the Fletcher checksum

The first difference between the two algorithms is that Adler-32 sums are calculated modulo a prime number, whereas Fletcher sums are calculated modulo 2⁴−1, 2⁸−1, or 2¹⁶−1 (depending on the number of bits used), which are all composite numbers. Using a prime number makes it possible for Adler-32 to catch differences in certain combinations of bytes that Fletcher is unable to detect.

The second difference, which has the largest effect on the speed of the algorithm, is that the Adler sums are computed over 8-bit bytes rather than 16-bit words, resulting in twice the number of loop iterations. This results in the Adler-32 checksum taking between one-and-a-half to two times as long as Fletcher's checksum for 16-bit word aligned data. For byte-aligned data, Adler-32 is faster than a properly implemented Fletcher's checksum (e.g., one found in the Hierarchical Data Format).

Example implementation

In C, an inefficient but straightforward implementation is :

constuint32_tMOD_ADLER=65521;uint32_tadler32(unsignedchar*data,size_tlen)/*     where data is the location of the data in physical memory and     len is the length of the data in bytes */{uint32_ta=1,b=0;size_tindex;// Process each byte of the data in orderfor(index=0;index<len;++index){a=(a+data[index])%MOD_ADLER;b=(b+a)%MOD_ADLER;}return(b<<16)|a;}

See the zlib source code for a more efficient implementation that requires a fetch and two additions per byte, with the modulo operations deferred with two remainders computed every several thousand bytes, a technique first discovered for Fletcher checksums in 1988. js-adler32 provides a similar optimization, with the addition of a trick that delays computing the "15" in 65536 - 65521 so that modulos become faster: it can be shown that ((a >> 16) * 15 + (a & 65535)) % 65521 is equivalent to the naive accumulation.^[3]

Advantages and disadvantages

Like the standard CRC-32, the Adler-32 checksum can be forged easily and is therefore unsafe for protecting against intentional modification.
It's faster than CRC-32 on many platforms.^[4]
Adler-32 has a weakness for short messages with a few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits.

Weakness

Adler-32 is weak for short messages because the sum A does not wrap. The maximum sum of a 128-byte message is 32640, which is below the value 65521 used by the modulo operation, meaning that roughly half of the output space is unused, and the distribution within the used part is nonuniform. An extended explanation can be found in RFC 3309, which mandates the use of CRC32C instead of Adler-32 for SCTP, the Stream Control Transmission Protocol.^[5] Adler-32 has also been shown to be weak for small incremental changes,^[6] and also weak for strings generated from a common prefix and consecutive numbers (like auto-generated label names by typical code generators).^[7]

Notes

↑ "First appearance of Adler-32 (see ChangeLog and adler32.c)".
↑ "Revisiting Fletcher and Adler Checksums" (PDF).
↑ "adler32.js". Sheet JS. 3 July 2019.
↑ Theresa C. Maxino, Philip J. Koopman (January 2009). "The Effectiveness of Checksums for Embedded Control Networks" (PDF). IEEE Transactions on Dependable and Secure Computing.
↑ RFC 3309
↑ "Cbloom rants: 08-21-10 - Adler32". 21 August 2010.
↑ "Hash functions: An empirical comparison - strchr.com". www.strchr.com.

External links

RFC 1950 – specification, contains example C code
ZLib – implements the Adler-32 checksum in adler32.c
Chrome – uses an SIMD implementation of Adler-32 adler32_simd.c
RFC 3309 – information about the short message weakness and related change to SCTP

Related Research Articles

<span class="mw-page-title-main">Checksum</span> Data used to detect errors in other data

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.

In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages to other hosts on an Internet Protocol (IP) network. Within an IP network, UDP does not require prior communication to set up communication channels or data paths.

In telecommunication, a longitudinal redundancy check (LRC), or horizontal redundancy check, is a form of redundancy check that is applied independently to each of a parallel group of bit streams. The data must be divided into transmission blocks, to which the additional check data is added.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

In cryptography, XTEA is a block cipher designed to correct weaknesses in TEA. The cipher's designers were David Wheeler and Roger Needham of the Cambridge Computer Laboratory, and the algorithm was presented in an unpublished technical report in 1997. It is not subject to any patents.

The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check but with the lower computational effort associated with summation techniques.

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.

In computer networking, the Datagram Congestion Control Protocol (DCCP) is a message-oriented transport layer protocol. DCCP implements reliable connection setup, teardown, Explicit Congestion Notification (ECN), congestion control, and feature negotiation. The IETF published DCCP as RFC 4340, a proposed standard, in March 2006. RFC 4336 provides an introduction.

In cryptography, a message authentication code based on universal hashing, or UMAC, is a type of message authentication code (MAC) calculated choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function used. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message. In contrast to traditional MACs, which are serializable, UMAC can be executed in parallel. Thus as machines continue to offer more parallel processing capabilities, the speed of implementing UMAC will increase.

A rolling hash is a hash function where the input is hashed in a window that moves through the input.

Internet Control Message Protocol version 6 (ICMPv6) is the implementation of the Internet Control Message Protocol (ICMP) for Internet Protocol version 6 (IPv6). ICMPv6 is an integral part of IPv6 and performs error reporting and diagnostic functions.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

The Stream Control Transmission Protocol (SCTP) has a simpler basic packet structure than TCP. Each consists of two basic sections:

The common header, which occupies the first 12 bytes. In the adjacent diagram, this header is highlighted in blue.
The data chunks, which form the remaining portion of the packet. In the diagram, the first chunk is highlighted in green and the last of N chunks (Chunk N) is highlighted in red. There are several types, including payload data and different control messages.

<span class="mw-page-title-main">Computation of cyclic redundancy checks</span> Overview of the computation of cyclic redundancy checks

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.

The Lehmer random number generator, sometimes also referred to as the Park–Miller random number generator, is a type of linear congruential generator (LCG) that operates in multiplicative group of integers modulo n. The general formula is

The BSD checksum algorithm was a commonly used, legacy checksum algorithm. It has been implemented in old BSD and is also available through the sum command line utility.

bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.

The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the protocol provides the message-oriented feature of the User Datagram Protocol (UDP), while ensuring reliable, in-sequence transport of messages with congestion control like the Transmission Control Protocol (TCP). Unlike UDP and TCP, the protocol supports multihoming and redundant paths to increase resilience and reliability.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "First appearance of Adler-32 (see ChangeLog and adler32.c)".

[2] "Revisiting Fletcher and Adler Checksums" (PDF).

[3] "adler32.js". Sheet JS. 3 July 2019.

[4] Theresa C. Maxino, Philip J. Koopman (January 2009). "The Effectiveness of Checksums for Embedded Control Networks" (PDF). IEEE Transactions on Dependable and Secure Computing.

[5] RFC 3309

[6] "Cbloom rants: 08-21-10 - Adler32". 21 August 2010.

[7] "Hash functions: An empirical comparison - strchr.com". www.strchr.com.

[1]

[2]

[3]

[4]

[5]

[6]

[7]