Hamming weight

Last updated

The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string, or the digit sum of the binary representation of a given number and the ₁ norm of a bit vector. In this binary case, it is also called the population count, [1] popcount, sideways sum, [2] or bit summation. [3]

Contents

Examples
StringHamming weight
111014
111010004
000000000
67801234056710
A plot for the population count (Hamming weight for binary numbers) for (decimal) numbers 0 to 256. [4] [5] [6]

History and usage

The Hamming weight is named after Richard Hamming although he did not originate the notion. [7] The Hamming weight of binary numbers was already used in 1899 by James W. L. Glaisher to give a formula for the number of odd binomial coefficients in a single row of Pascal's triangle. [8] Irving S. Reed introduced a concept, equivalent to Hamming weight in the binary case, in 1954. [9]

Hamming weight is used in several disciplines including information theory, coding theory, and cryptography. Examples of applications of the Hamming weight include:

Efficient implementation

The population count of a bitstring is often needed in cryptography and other applications. The Hamming distance of two words A and B can be calculated as the Hamming weight of A xor B. [1]

The problem of how to implement it efficiently has been widely studied. A single operation for the calculation, or parallel operations on bit vectors are available on some processors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number a = 0110 1100 1011 1010, these operations can be done:

ExpressionBinaryDecimalComment
a011011001011101027834The original number
b0 = (a >> 0) & 01 01 01 01 01 01 01 0101000100000100001, 0, 1, 0, 0, 1, 0, 0Every other bit from a
b1 = (a >> 1) & 01 01 01 01 01 01 01 0100010100010101010, 1, 1, 0, 1, 1, 1, 1The remaining bits from a
c = b0 + b101011000011001011, 1, 2, 0, 1, 2, 1, 1Count of 1s in each 2-bit slice of a
d0 = (c >> 0) & 0011 0011 0011 001100010000001000011, 0, 2, 1Every other count from c
d2 = (c >> 2) & 0011 0011 0011 001100010010000100011, 2, 1, 1The remaining counts from c
e = d0 + d200100010001100102, 2, 3, 2Count of 1s in each 4-bit slice of a
f0 = (e >> 0) & 00001111 0000111100000010000000102, 2Every other count from e
f4 = (e >> 4) & 00001111 0000111100000010000000112, 3The remaining counts from e
g = f0 + f400000100000001014, 5Count of 1s in each 8-bit slice of a
h0 = (g >> 0) & 000000001111111100000000000001015Every other count from g
h8 = (g >> 8) & 000000001111111100000000000001004The remaining counts from g
i = h0 + h800000000000010019Count of 1s in entire 16-bit word

Here, the operations are as in C programming language, so X >> Y means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here: [1]

//types and constants used in the functions below//uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language)constuint64_tm1=0x5555555555555555;//binary: 0101...constuint64_tm2=0x3333333333333333;//binary: 00110011..constuint64_tm4=0x0f0f0f0f0f0f0f0f;//binary:  4 zeros,  4 ones ...constuint64_tm8=0x00ff00ff00ff00ff;//binary:  8 zeros,  8 ones ...constuint64_tm16=0x0000ffff0000ffff;//binary: 16 zeros, 16 ones ...constuint64_tm32=0x00000000ffffffff;//binary: 32 zeros, 32 onesconstuint64_th01=0x0101010101010101;//the sum of 256 to the power of 0,1,2,3...//This is a naive implementation, shown for comparison,//and to help in understanding the better functions.//This algorithm uses 24 arithmetic operations (shift, add, and).intpopcount64a(uint64_tx){x=(x&m1)+((x>>1)&m1);//put count of each  2 bits into those  2 bits x=(x&m2)+((x>>2)&m2);//put count of each  4 bits into those  4 bits x=(x&m4)+((x>>4)&m4);//put count of each  8 bits into those  8 bits x=(x&m8)+((x>>8)&m8);//put count of each 16 bits into those 16 bits x=(x&m16)+((x>>16)&m16);//put count of each 32 bits into those 32 bits x=(x&m32)+((x>>32)&m32);//put count of each 64 bits into those 64 bits returnx;}//This uses fewer arithmetic operations than any other known  //implementation on machines with slow multiplication.//This algorithm uses 17 arithmetic operations.intpopcount64b(uint64_tx){x-=(x>>1)&m1;//put count of each 2 bits into those 2 bitsx=(x&m2)+((x>>2)&m2);//put count of each 4 bits into those 4 bits x=(x+(x>>4))&m4;//put count of each 8 bits into those 8 bits x+=x>>8;//put count of each 16 bits into their lowest 8 bitsx+=x>>16;//put count of each 32 bits into their lowest 8 bitsx+=x>>32;//put count of each 64 bits into their lowest 8 bitsreturnx&0x7f;}//This uses fewer arithmetic operations than any other known  //implementation on machines with fast multiplication.//This algorithm uses 12 arithmetic operations, one of which is a multiply.intpopcount64c(uint64_tx){x-=(x>>1)&m1;//put count of each 2 bits into those 2 bitsx=(x&m2)+((x>>2)&m2);//put count of each 4 bits into those 4 bits x=(x+(x>>4))&m4;//put count of each 8 bits into those 8 bits return(x*h01)>>56;//returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... }

The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As Wegner described in 1960, [14] the bitwise AND of x with x  1 differs from x only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If x originally had n bits that were 1, then after only n iterations of this operation, x will be reduced to zero. The following implementation is based on this principle.

//This is better when most bits in x are 0//This algorithm works the same for all data sizes.//This algorithm uses 3 arithmetic operations and 1 comparison/branch per "1" bit in x.intpopcount64d(uint64_tx){intcount;for(count=0;x;count++)x&=x-1;returncount;}

If greater memory usage is allowed, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer.

staticuint8_twordbits[65536]={/* bitcounts of integers 0 through 65535, inclusive */};//This algorithm uses 3 arithmetic operations and 2 memory reads.intpopcount32e(uint32_tx){returnwordbits[x&0xFFFF]+wordbits[x>>16];}
//Optionally, the wordbits[] table could be filled using this functionintpopcount32e_init(void){uint32_ti;uint16_tx;intcount;for(i=0;i<=0xFFFF;i++){x=i;for(count=0;x;count++)// borrowed from popcount64d() abovex&=x-1;wordbits[i]=count;}}

Muła et al. [15] have shown that a vectorized version of popcount64b can run faster than dedicated instructions (e.g., popcnt on x64 processors).

Minimum weight

In error-correcting coding, the minimum Hamming weight, commonly referred to as the minimum weightwmin of a code is the weight of the lowest-weight non-zero code word. The weight w of a code word is the number of 1s in the word. For example, the word 11001010 has a weight of 4.

In a linear block code the minimum weight is also the minimum Hamming distance (dmin) and defines the error correction capability of the code. If wmin = n, then dmin = n and the code will correct up to dmin/2 errors. [16]

Language support

Some C compilers provide intrinsic functions that provide bit counting facilities. For example, GCC (since version 3.4 in April 2004) includes a builtin function __builtin_popcount that will use a processor instruction if available or an efficient library implementation otherwise. [17] LLVM-GCC has included this function since version 1.5 in June 2005. [18]

In the C++ Standard Library, the bit-array data structure bitset has a count() method that counts the number of bits that are set. In C++20, a new header <bit> was added, containing functions std::popcount and std::has_single_bit, taking arguments of unsigned integer types.

In Java, the growable bit-array data structure BitSet has a BitSet.cardinality() method that counts the number of bits that are set. In addition, there are Integer.bitCount(int) and Long.bitCount(long) functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the BigInteger arbitrary-precision integer class also has a BigInteger.bitCount() method that counts bits.

In Python, the int type has a bit_count() method to count the number of bits set. This functionality was introduced in Python 3.10, released in October 2021. [19]

In Common Lisp, the function logcount, given a non-negative integer, returns the number of 1 bits. (For negative integers it returns the number of 0 bits in 2's complement notation.) In either case the integer can be a BIGNUM.

Starting in GHC 7.4, the Haskell base package has a popCount function available on all types that are instances of the Bits class (available from the Data.Bits module). [20]

MySQL version of SQL language provides BIT_COUNT() as a standard function. [21]

Fortran 2008 has the standard, intrinsic, elemental function popcnt returning the number of nonzero bits within an integer (or integer array). [22]

Some programmable scientific pocket calculators feature special commands to calculate the number of set bits, e.g. #B on the HP-16C [3] [23] and WP 43S, [24] [25] #BITS [26] [27] or BITSUM [28] [29] on HP-16C emulators, and nBITS on the WP 34S. [30] [31]

FreePascal implements popcnt since version 3.0. [32]

Processor support

See also

Related Research Articles

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

<span class="mw-page-title-main">Modular arithmetic</span> Computation modulo a fixed integer

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.

<span class="mw-page-title-main">Hamming distance</span> Number of bits that differ between two strings

In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.

AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple, IBM, and Freescale Semiconductor — the AIM alliance. It is implemented on versions of the PowerPC processor architecture, including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX by IBM and P.A. Semi.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SWAR Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

<span class="mw-page-title-main">XOR swap algorithm</span> Binary arithmetic algorithm

In computer programming, the exclusive or swap is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

Two's complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.

In mathematics, finite field arithmetic is arithmetic in a finite field contrary to arithmetic in a field with an infinite number of elements, like the field of rational numbers.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

Bit manipulation is the act of algorithmically manipulating bits or other pieces of data shorter than a word. Computer programming tasks that require bit manipulation include low-level device control, error detection and correction algorithms, data compression, encryption algorithms, and optimization. For most other tasks, modern programming languages allow the programmer to work directly with abstractions instead of bits that represent those abstractions.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

The Lehmer random number generator, sometimes also referred to as the Park–Miller random number generator, is a type of linear congruential generator (LCG) that operates in multiplicative group of integers modulo n. The general formula is

<span class="mw-page-title-main">Xorshift</span> Class of pseudorandom number generators

Xorshift random number generators, also called shift-register generators, are a class of pseudorandom number generators that were invented by George Marsaglia. They are a subset of linear-feedback shift registers (LFSRs) which allow a particularly efficient implementation in software without the excessive use of sparse polynomials. They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes execution extremely efficient on modern computer architectures, but it does not benefit efficiency in a hardware implementation. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period.

LEB128 or Little Endian Base 128 is a variable-length code compression used to store arbitrarily large integers in a small number of bytes. LEB128 is used in the DWARF debug file format and the WebAssembly binary encoding for all integer literals.

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋. This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1, herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

In the C programming language, operations can be performed on a bit level using bitwise operators.

A permuted congruential generator (PCG) is a pseudorandom number generation algorithm developed in 2014 by Dr. M.E. O'Neill which applies an output permutation function to improve the statistical properties of a modulo-2n linear congruential generator. It achieves excellent statistical performance with small and fast code, and small state size.

References

  1. 1 2 3 4 5 6 7 Warren Jr., Henry S. (2013) [2002]. Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc. pp. 81–96. ISBN   978-0-321-84268-8. 0-321-84268-5.
  2. Knuth, Donald Ervin (2009). "Bitwise tricks & techniques; Binary Decision Diagrams". The Art of Computer Programming. Vol. 4, Fascicle 1. Addison–Wesley Professional. ISBN   978-0-321-58050-4. (NB. Draft of Fascicle 1b Archived 2016-03-12 at the Wayback Machine available for download.)
  3. 1 2 Hewlett-Packard HP-16C Computer Scientist Owner's Handbook (PDF). Hewlett-Packard Company. April 1982. 00016-90001. Archived (PDF) from the original on 2017-03-28. Retrieved 2017-03-28.
  4. Archived 2019-09-30 at the Wayback Machine , written in Fōrmulæ. The Fōrmulæ wiki. Retrieved 2019-09-30.
  5. A solution to the task Population count. Retrieved 2019-09-30.
  6. Rosetta Code. Retrieved 2019-09-30.
  7. Thompson, Thomas M. (1983). From Error-Correcting Codes through Sphere Packings to Simple Groups. The Carus Mathematical Monographs #21. The Mathematical Association of America. p. 33.
  8. Glaisher, James Whitbread Lee (1899). "On the residue of a binomial-theorem coefficient with respect to a prime modulus". The Quarterly Journal of Pure and Applied Mathematics . 30: 150–156. (NB. See in particular the final paragraph of p. 156.)
  9. Reed, Irving Stoy (1954). "A Class of Multiple-Error-Correcting Codes and the Decoding Scheme". IRE Professional Group on Information Theory . PGIT-4. Institute of Radio Engineers (IRE): 38–49.
  10. Cohen, Gérard D.; Lobstein, Antoine; Naccache, David; Zémor, Gilles (1998). "How to improve an exponentiation black-box". In Nyberg, Kaisa (ed.). Advances in Cryptology – EUROCRYPT '98, International Conference on the Theory and Application of Cryptographic Techniques, Espoo, Finland, May 31 – June 4, 1998, Proceeding. Lecture Notes in Computer Science. Vol. 1403. Springer. pp. 211–220. doi: 10.1007/BFb0054128 . ISBN   978-3-540-64518-4.
  11. Stoica, I.; Morris, R.; Liben-Nowell, D.; Karger, D. R.; Kaashoek, M. F.; Dabek, F.; Balakrishnan, H. (February 2003). "Chord: a scalable peer-to-peer lookup protocol for internet applications". IEEE/ACM Transactions on Networking . 11 (1): 17–32. doi:10.1109/TNET.2002.808407. S2CID   221276912. Section 6.3: "In general, the number of fingers we need to follow will be the number of ones in the binary representation of the distance from node to query."
  12. 1 2 SPARC International, Inc. (1992). "A.41: Population Count. Programming Note" . The SPARC architecture manual: version 8 (Version 8 ed.). Englewood Cliffs, New Jersey, USA: Prentice Hall. pp.  231. ISBN   0-13-825001-4.
  13. Blaxell, David (1978). Hogben, David; Fife, Dennis W. (eds.). "Record linkage by bit pattern matching". Computer Science and Statistics--Tenth Annual Symposium on the Interface. NBS Special Publication. 503. U.S. Department of Commerce / National Bureau of Standards: 146–156.
  14. Wegner, Peter (May 1960). "A technique for counting ones in a binary computer". Communications of the ACM . 3 (5): 322. doi: 10.1145/367236.367286 . S2CID   31683715.
  15. Muła, Wojciech; Kurz, Nathan; Lemire, Daniel (January 2018). "Faster Population Counts Using AVX2 Instructions". Computer Journal . 61 (1): 111–120. arXiv: 1611.07612 . doi:10.1093/comjnl/bxx046. S2CID   540973.
  16. Stern & Mahmoud, Communications System Design, Prentice Hall, 2004, p 477ff.
  17. "GCC 3.4 Release Notes". GNU Project.
  18. "LLVM 1.5 Release Notes". LLVM Project.
  19. "What's New In Python 3.10". python.org.
  20. "GHC 7.4.1 release notes". GHC documentation.
  21. "Chapter 12.11. Bit Functions — MySQL 5.0 Reference Manual".
  22. Metcalf, Michael; Reid, John; Cohen, Malcolm (2011). Modern Fortran Explained. Oxford University Press. p. 380. ISBN   978-0-19-960142-4.
  23. Schwartz, Jake; Grevelle, Rick (2003-10-20) [1993]. HP16C Emulator Library for the HP48S/SX. 1.20 (1 ed.). Retrieved 2015-08-15. (NB. This library also works on the HP 48G/GX/G+. Beyond the feature set of the HP-16C this package also supports calculations for binary, octal, and hexadecimal floating-point numbers in scientific notation in addition to the usual decimal floating-point numbers.)
  24. Bonin, Walter (2019) [2015]. WP 43S Owner's Manual (PDF). 0.12 (draft ed.). p. 135. ISBN   978-1-72950098-9 . Retrieved 2019-08-05.[ permanent dead link ] (314 pages)
  25. Bonin, Walter (2019) [2015]. WP 43S Reference Manual (PDF). 0.12 (draft ed.). pp. xiii, 104, 115, 120, 188. ISBN   978-1-72950106-1 . Retrieved 2019-08-05.[ permanent dead link ] (271 pages)
  26. Martin, Ángel M.; McClure, Greg J. (2015-09-05). "HP16C Emulator Module for the HP-41CX - User's Manual and QRG" (PDF). Archived (PDF) from the original on 2017-04-27. Retrieved 2017-04-27. (NB. Beyond the HP-16C feature set this custom library for the HP-41CX extends the functionality of the calculator by about 50 additional functions.)
  27. Martin, Ángel M. (2015-09-07). "HP-41: New HP-16C Emulator available". Archived from the original on 2017-04-27. Retrieved 2017-04-27.
  28. Thörngren, Håkan (2017-01-10). "Ladybug Documentation" (release 0A ed.). Retrieved 2017-01-29.
  29. "New HP-41 module available: Ladybug". 2017-01-10. Archived from the original on 2017-01-29. Retrieved 2017-01-29.
  30. Dale, Paul; Bonin, Walter (2012) [2008]. "WP 34S Owner's Manual" (PDF) (3.1 ed.). Retrieved 2017-04-27.
  31. Bonin, Walter (2015) [2008]. WP 34S Owner's Manual (3.3 ed.). CreateSpace Independent Publishing Platform. ISBN   978-1-5078-9107-0.
  32. "Free Pascal documentation popcnt" . Retrieved 2019-12-07.
  33. "JDK-6378821: bitCount() should use POPC on SPARC processors and AMD+10h". Java bug database. 2006-01-30.
  34. Blackfin Instruction Set Reference (Preliminary ed.). Analog Devices. 2001. pp. 8–24. Part Number 82-000410-14.
  35. Wolf, Claire (2019-03-22). "RISC-V "B" Bit Manipulation Extension for RISC-V, Draft v0.37" (PDF). Github.

Further reading