BSD checksum

Last updated March 29, 2022

The BSD checksum algorithm was a commonly used, legacy checksum algorithm. It has been implemented in old BSD and is also available through the sum command line utility.

Computation of the BSD checksum

Below is the relevant part of the GNU sum source code (GPL licensed). It computes a 16-bit checksum by adding up all bytes (8-bit words) of the input data stream. In order to avoid many of the weaknesses of simply adding the data, the checksum accumulator is circular rotated to the right by one bit at each step before the new char is added.

intbsdChecksumFromFile(FILE*fp)/* The file handle for input data */{intchecksum=0;/* The checksum mod 2^16. */for(intch=getc(fp);ch!=EOF;ch=getc(fp)){checksum=(checksum>>1)+((checksum&1)<<15);checksum+=ch;checksum&=0xffff;/* Keep it within bounds. */}returnchecksum;}

Description of the algorithm

As mentioned above, this algorithm computes a checksum by segmenting the data and adding it to an accumulator that is circular right shifted between each summation. To keep the accumulator within return value bounds, bit-masking with 1's is done.

Example: Calculating a 4-bit checksum using 4-bit sized segments (big-endian)

Input: 101110001110 -> three segments: 1011, 1000, 1110.

Iteration 1:

 segment: 1011        checksum: 0000        bitmask: 1111

a) Apply circular shift to the checksum:

 0000 -> 0000

b) Add checksum and segment together, apply bitmask onto the obtained result:

 0000 + 1011 = 1011 -> 1011 & 1111 = 1011

Iteration 2:

 segment: 1000        checksum: 1011        bitmask: 1111

a) Apply circular shift to the checksum:

 1011 -> 1101

b) Add checksum and segment together, apply bitmask onto the obtained result:

 1101 + 1000 = 10101 -> 10101 & 1111 = 0101

Iteration 3:

 segment: 1110        checksum: 0101        bitmask: 1111

a) Apply circular shift to the checksum:

 0101 -> 1010

b) Add checksum and segment together, apply bitmask onto the obtained result:

 1010 + 1110 = 11000 -> 11000 & 1111 = 1000

Final checksum: 1000

Related Research Articles

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

Golden ratio base is a non-integer positional numeral system that uses the golden ratio as its base. It is sometimes referred to as base-φ, golden mean base, phi-base, or, colloquially, phinary. Any non-negative real number can be represented as a base-φ numeral using only the digits 0 and 1, and avoiding the digit sequence "11" – this is called a standard form. A base-φ numeral that includes the digit sequence "11" can always be rewritten in standard form, using the algebraic properties of the base φ — most notably that φ + 1 = φ². For instance, 11_φ = 100_φ.

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

ISAAC is a cryptographically secure pseudorandom number generator and a stream cipher designed by Robert J. Jenkins Jr. in 1993. The reference implementation source code was dedicated to the public domain.

Two's complement is a mathematical operation on binary numbers, and is an example of a radix complement. It is used in computer science as a method of signed number representation. When the Most Significant Bit is a one, the number is signed as negative..

md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash. However, it is very unlikely that any two non-identical files in the real world will have the same MD5 hash, unless they have been specifically created to have the same hash.

The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the late 1970s. The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check but with the lower computational effort associated with summation techniques.

Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth's algorithm is of interest in the study of computer architecture.

A rolling hash is a hash function where the input is hashed in a window that moves through the input.

A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers.

In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes, most often in computer graphics, where iterations are small and precision has aesthetic effects. Machine learning also uses similar formats like bfloat16. Additionally, they are frequently encountered as a pedagogical tool in computer-science courses to demonstrate the properties and structures of floating-point arithmetic and IEEE 754 numbers.

In computer science, the double dabble algorithm is used to convert binary numbers into binary-coded decimal (BCD) notation. It is also known as the shift-and-add-3 algorithm, and can be implemented using a small number of gates in computer hardware, but at the expense of high latency.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs the checksum of each argument file, as well as the number of blocks they take on disk.

In coding theory, a standard array is a $by array that lists all elements of a particular vector space. Standard arrays are used to decode linear codes; i.e. to find the corresponding codeword for any received vector.$

Fast inverse square root Root-finding algorithm

Fast inverse square root, sometimes referred to as Fast InvSqrt or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates $, the reciprocal of the square root of a 32-bit floating-point number in IEEE 754 floating-point format. This operation is used in digital signal processing to normalize a vector, i.e., scale it to length 1. For example, computer graphics programs use inverse square roots to compute angles of incidence and reflection for lighting and shading. The algorithm is best known for its implementation in 1999 in the source code of Quake III Arena, a first-person shooter video game that made heavy use of 3D graphics. The algorithm only started appearing on public forums such as Usenet in 2002 or 2003. Computation of square roots usually depends upon many division operations, which for floating point numbers are computationally expensive. The fast inverse square generated a good approximation with only one division step. Other video games predating Quake 3 have been discovered which use a similar algorithm, though the Quake implementation remains the best-known example.$

The SYSV checksum algorithm was a commonly used, legacy checksum algorithm. It has been implemented in UNIX System V and is also available through the sum command line utility.

Reduction of summands is an algorithm for fast binary multiplication of non-signed binary integers. It is performed in three steps: production of summands, reduction of summands, and summation.

The ones' complement of a binary number is the value obtained by inverting all the bits in the binary representation of the number. This mathematical operation is primarily of interest in computer science, where it has varying effects depending on how a specific computer represents numbers.

In the C programming language, operations can be performed on a bit level using bitwise operators.

A half-carry flag is a condition flag bit in the status register of many CPU families, such as the Intel 8080, Zilog Z80, the x86, and the Atmel AVR series, among others. It indicates when a carry or borrow has been generated out of the least significant four bits of the accumulator register following the execution of an arithmetic instruction. It is primarily used in decimal (BCD) arithmetic instructions.

References

↑ sum(1) — manual pages from GNU coreutils
↑ sum(1) – FreeBSD General Commands Manual

Sources

official FreeBSD sum source code
official GNU sum manual page
coreutils download page --- find and unpack the newest version of the coreutils package, read src/sum.c

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[sum-invocation-1] sum(1) — manual pages from GNU coreutils

[fbsd-2] sum(1) – FreeBSD General Commands Manual

[1]

[2]

BSD checksum

Contents

Computation of the BSD checksum

Description of the algorithm

Related Research Articles

References

Sources