Carry-save adder

Last updated July 03, 2023

A carry-save adder^[1]^[2]^{[nb 1]} is a type of digital adder, used to efficiently compute the sum of three or more binary numbers. It differs from other digital adders in that it outputs two (or more) numbers, and the answer of the original summation can be achieved by adding these outputs together. A carry save adder is typically used in a binary multiplier, since a binary multiplier involves addition of more than two binary numbers after multiplication. A big adder implemented using this technique will usually be much faster than conventional addition of those numbers.

Motivation

Consider the sum:

   12345678 +  87654322 = 100000000

Using basic arithmetic, we calculate right to left, "8 + 2 = 0, carry 1", "7 + 2 + 1 = 0, carry 1", "6 + 3 + 1 = 0, carry 1", and so on to the end of the sum. Although we know the last digit of the result at once, we cannot know the first digit until we have gone through every digit in the calculation, passing the carry from each digit to the one on its left. Thus adding two n-digit numbers has to take a time proportional to n, even if the machinery we are using would otherwise be capable of performing many calculations simultaneously.

In electronic terms, using bits (binary digits), this means that even if we have n one-bit adders at our disposal, we still have to allow a time proportional to n to allow a possible carry to propagate from one end of the number to the other. Until we have done this,

We do not know the result of the addition.
We do not know whether the result of the addition is larger or smaller than a given number (for instance, we do not know whether it is positive or negative).

A carry look-ahead adder can reduce the delay. In principle the delay can be reduced so that it is proportional to log n, but for large numbers this is no longer the case, because even when carry look-ahead is implemented, the distances that signals have to travel on the chip increase in proportion to n, and propagation delays increase at the same rate. Once we get to the 512-bit to 2048-bit number sizes that are required in public-key cryptography, carry look-ahead is not of much help.

The basic concept

The idea of delaying carry resolution until the end, or saving carries, is due to John von Neumann.^[3]

The sum of two digits can never carry more than a 1, and the sum of two digits plus 1 can also never carry more than 1. For example, in decimal, $9+9=18$ , which carries a 1; $9+9+1=19$ , which also carries a 1. When adding three figures, we can add the first two and produce a sum and the carry digits; then add the sum and the carry digits to the third figure and produce a sum and the carry digits. In binary, the only digits are zero and one, and so $0+0=0$ , $0+1=1$ , and $1+1=0$ with a carried 1. Adding the carry bit can give, at most, $1+1+1=1$ with a carried 1, so a three-way addition is possible. Because of this, it is also possible to add the first three figures and produce the sum and carry; for subsequent figures, the sum and carry are two terms, and the next single figure is added to these.

Here is an example of a binary sum of 3 long binary numbers:

  1011 1010 1010 1101 1111 0000 0000 1101 (a) + 1101 1110 1010 1101 1011 1110 1110 1111 (b) + 0001 0010 1011 0111 0101 0011 0101 0010 (c)

The straightforward way to do it would be to first compute (a+b), and then compute ((a+b)+c). Carry-save arithmetic works by abandoning any kind of carry propagation. It computes the sum digit by digit, as:

  1011 1010 1010 1101 1111 0000 0000 1101 (a) + 1101 1110 1010 1101 1011 1110 1110 1111 (b) + 0001 0010 1011 0111 0101 0011 0101 0010 (c) = 2113 2130 3031 2313 2223 1121 1211 2222

The notation is unconventional, but the result is still unambiguous: Σ2ⁱd_i. If we assume the three numbers to be a, b and c. Then here, the result will be described as the sum of two binary numbers, where the first number, S, is simply the sum obtained by adding the digits (without any carry propagation), i.e. S_i = a_i ⊕ b_i ⊕ c_i and the second number, C, is composed of carries from the previous individual sums, i.e. C_i+1 = (a_ib_i) + (b_ic_i) + (c_ia_i) :

   0111 0110 1011 0111 0001 1101 1011 0000 and  1 0011 0101 0101 1011 1110 0100 1001 1110

Now these 2 numbers can be sent to a carry-propagate adder which will output the result.

This was very advantageous from a delay (computation-time) perspective. If you were to add these 3 numbers using conventional methods, it would take you 2 carry-propagate adder delays to get to the answer. If you use the carry-save technique, you require only 1 carry-propagate adder delay and 1 full-adder delay (which is much lower than a carry-propagate delay). Thus, CSAs are typically very fast.

Carry-save accumulators

Supposing that we have two bits of storage per digit, we can use a redundant binary representation, storing the values 0, 1, 2, or 3 in each digit position. It is therefore obvious that one more binary number can be added to our carry-save result without overflowing our storage capacity: but then what?

The key to success is that at the moment of each partial addition we add three bits:

0 or 1, from the number we are adding.
0 if the digit in our store is 0 or 2, or 1 if it is 1 or 3.
0 if the digit to its right is 0 or 1, or 1 if it is 2 or 3.

To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous calculation and not the current one. In each clock cycle, carries only have to move one step along, and not n steps as in conventional addition.

Because signals don't have to move as far, the clock can tick much faster. ..

There is still a need to convert the result to binary at the end of a calculation, which effectively just means letting the carries travel all the way through the number just as in a conventional adder. But if we have done 512 additions in the process of performing a 512-bit multiplication, the cost of that final conversion is effectively split across those 512 additions, so each addition bears 1/512 of the cost of that final "conventional" addition.

Drawbacks

At each stage of a carry-save addition,

We know the result of the addition at once.
We still do not know whether the result of the addition is larger or smaller than a given number (for instance, we do not know whether it is positive or negative).

This latter point is a drawback when using carry-save adders to implement modular multiplication (multiplication followed by division, keeping the remainder only).^[4]^[5] If we cannot know whether the intermediate result is greater or less than the modulus, how can we know whether to subtract the modulus?

Montgomery multiplication, which depends on the rightmost digit of the result, is one solution; though rather like carry-save addition itself, it carries a fixed overhead, so that a sequence of Montgomery multiplications saves time but a single one does not. Fortunately exponentiation, which is effectively a sequence of multiplications, is the most common operation in public-key cryptography.

Careful error analysis^[6] allows a choice to be made about subtracting the modulus even though we don't know for certain whether the result of the addition is big enough to warrant the subtraction. For this to work, it is necessary for the circuit design to be able to add −2, −1, 0, +1 or +2 times the modulus. The advantage over Montgomery multiplication is that there is no fixed overhead attached to each sequence of multiplications.

Technical details

The carry-save unit consists of n full adders, each of which computes a single sum and carry bit based solely on the corresponding bits of the three input numbers. Given the three n-bit numbers a, b, and c, it produces a partial sum ps and a shift-carry sc:

ps_{i}=a_{i}\oplus b_{i}\oplus c_{i},

sc_{i}=(a_{i}\wedge b_{i})\vee (a_{i}\wedge c_{i})\vee (b_{i}\wedge c_{i}).

The entire sum can then be computed by:

Shifting the carry sequence sc left by one place.
Appending a 0 to the front (most significant bit) of the partial sum sequence ps.
Using a ripple carry adder to add these two together and produce the resulting (n + 1)-bit value.

Notes

↑ Carry-save adder is often abbreviated as CSA, however, this can be confused with the carry-skip adder.

Related Research Articles

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

A multiplication algorithm is an algorithm to multiply two numbers. Depending on the size of the numbers, different algorithms are more efficient than others. Efficient multiplication algorithms have existed since the advent of the decimal system.

Golden ratio base is a non-integer positional numeral system that uses the golden ratio as its base. It is sometimes referred to as base-φ, golden mean base, phi-base, or, colloquially, phinary. Any non-negative real number can be represented as a base-φ numeral using only the digits 0 and 1, and avoiding the digit sequence "11" – this is called a standard form. A base-φ numeral that includes the digit sequence "11" can always be rewritten in standard form, using the algebraic properties of the base φ — most notably that φ¹ + φ⁰ = φ². For instance, 11_φ = 100_φ.

A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" (one).

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent negative value, using the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. It is used in computer science as the most common method of representing signed integers on computers, and more generally, fixed point binary values. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive (see Converting from two's complement representation, below).

An adder, or summer, is a digital circuit that performs addition of numbers. In many computers and other kinds of processors adders are used in the arithmetic logic units (ALUs). They are also used in other parts of the processor, where they are used to calculate addresses, table indices, increment and decrement operators and similar operations.

PiHex was a distributed computing project organized by Colin Percival to calculate specific bits of $π$ . 1,246 contributors used idle time slices on almost two thousand computers to make its calculations. The software used for the project made use of Bellard's formula, a faster version of the BBP formula.

A carry-lookahead adder (CLA) or fast adder is a type of electronics adder used in digital logic. A carry-lookahead adder improves speed by reducing the amount of time required to determine carry bits. It can be contrasted with the simpler, but usually slower, ripple-carry adder (RCA), for which the carry bit is calculated alongside the sum bit, and each stage must wait until the previous carry bit has been calculated to begin calculating its own sum bit and carry bit. The carry-lookahead adder calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger-value bits of the adder.

A Wallace multiplier is a hardware implementation of a binary multiplier, a digital circuit that multiplies two integers. It uses a selection of full and half adders to sum partial products in stages until two numbers are left. Wallace multipliers reduce as much as possible on each layer, whereas Dadda multipliers try to minimize the required number of gates by postponing the reduction to the upper layers.

The Dadda multiplier is a hardware binary multiplier design invented by computer scientist Luigi Dadda in 1965. It uses a selection of full and half adders to sum the partial products in stages until two numbers are left. The design is similar to the Wallace multiplier, but the different reduction tree reduces the required number of gates and makes it slightly faster.

A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers.

The Karatsuba algorithm is a fast multiplication algorithm. It was discovered by Anatoly Karatsuba in 1960 and published in 1962. It is a divide-and-conquer algorithm that reduces the multiplication of two n-digit numbers to three multiplications of n/2-digit numbers and, by repeating this reduction, to at most $single-digit multiplications. It is therefore asymptotically faster than the traditional algorithm, which performs single-digit products.$

In computer science, the double dabble algorithm is used to convert binary numbers into binary-coded decimal (BCD) notation. It is also known as the shift-and-add-3 algorithm, and can be implemented using a small number of gates in computer hardware, but at the expense of high latency.

In electronics, a subtractor – a digital circuit that performs subtraction of numbers – can be designed using the same approach as that of an adder. The binary subtraction process is summarized below. As with an adder, in the general case of calculations on multi-bit numbers, three bits are involved in performing the subtraction for each bit of the difference: the minuend, subtrahend, and a borrow in from the previous bit order position. The outputs are the difference bit and borrow bit $. The subtractor is best understood by considering that the subtrahend and both borrow bits have negative weights, whereas the X and D bits are positive. The operation performed by the subtractor is to rewrite as the sum .$

The serial binary adder or bit-serial adder is a digital circuit that performs binary addition bit by bit. The serial full adder has three single-bit inputs for the numbers to be added and the carry in. There are two single-bit outputs for the sum and carry out. The carry-in signal is the previously calculated carry-out signal. The addition is performed by adding each bit, lowest to highest, one per clock cycle.

In coding theory, a standard array is a $by array that lists all elements of a particular vector space. Standard arrays are used to decode linear codes; i.e. to find the corresponding codeword for any received vector.$

Kochanski multiplication is an algorithm that allows modular arithmetic to be performed efficiently when the modulus is large. This has particular application in number theory and in cryptography: for example, in the RSA cryptosystem and Diffie–Hellman key exchange.

A redundant binary representation (RBR) is a numeral system that uses more bits than needed to represent a single binary digit so that most numbers have several representations. An RBR is unlike usual binary numeral systems, including two's complement, which use a single bit for each digit. Many of an RBR's properties differ from those of regular binary representation systems. Most importantly, an RBR allows addition without using a typical carry. When compared to non-redundant representation, an RBR makes bitwise logical operation slower, but arithmetic operations are faster when a greater bit width is used. Usually, each digit has its own sign that is not necessarily the same as the sign of the number represented. When digits have signs, that RBR is also a signed-digit representation.

The ones' complement of a binary number is the value obtained by inverting all the bits in the binary representation of the number. The name "ones' complement" refers to the fact that such an inverted value, if added to the original, would always produce an "all ones" number. This mathematical operation is primarily of interest in computer science, where it has varying effects depending on how a specific computer represents numbers.

References

↑ Earle, John G. (1965-07-12), Latched Carry Save Adder Circuit for Multipliers, U.S. Patent 3,340,388
↑ Earle, John G. (March 1965), "Latched Carry-Save Adder", IBM Technical Disclosure Bulletin, 7 (10): 909–910
↑ von Neumann, John. Collected Works.
↑ Parhami, Behrooz (2010). Computer arithmetic: algorithms and hardware designs (2nd ed.). New York: Oxford University Press. ISBN 978-0-19-532848-6. OCLC 428033168.
↑ Lyakhov, P.; Valueva, M.; Valuev, G.; Nagornov, N. (2020). "High-Performance Digital Filtering on Truncated Multiply-Accumulate Units in the Residue Number System". IEEE Access. 8: 209181–209190. doi: 10.1109/ACCESS.2020.3038496 . ISSN 2169-3536.
↑ Kochanski, Martin (2003-08-19). "A New Method of Serial Modular Multiplication" (PDF). Archived from the original (PDF) on 2018-07-16. Retrieved 2018-07-16.