Wallace tree

Last updated January 28, 2024

A Wallace multiplier is a hardware implementation of a binary multiplier, a digital circuit that multiplies two integers. It uses a selection of full and half adders (the Wallace tree or Wallace reduction) to sum partial products in stages until two numbers are left. Wallace multipliers reduce as much as possible on each layer, whereas Dadda multipliers try to minimize the required number of gates by postponing the reduction to the upper layers.^[1]

Compared to naively adding partial products with regular adders, the benefit of the Wallace tree is its faster speed. It has $O(\log n)$ reduction layers, but each layer has only $O(1)$ propagation delay. A naive addition of partial products would require $O(\log ^{2}n)$ time. As making the partial products is $O(1)$ and the final addition is $O(\log n)$ , the total multiplication is $O(\log n)$ , not much slower than addition. From a complexity theoretic perspective, the Wallace tree algorithm puts multiplication in the class NC¹. The downside of the Wallace tree, compared to naive addition of partial products, is its much higher gate count.

These computations only consider gate delays and don't deal with wire delays, which can also be very substantial.

The Wallace tree can be also represented by a tree of 3/2 or 4/2 adders.

It is sometimes combined with Booth encoding.^[4]^[5]

Detailed explanation

The Wallace tree is a variant of long multiplication. The first step is to multiply each digit (each bit) of one factor by each digit of the other. Each of these partial products has weight equal to the product of its factors. The final product is calculated by the weighted sum of all these partial products.

The first step, as said above, is to multiply each bit of one number by each bit of the other, which is accomplished as a simple AND gate, resulting in $n^{2}$ bits; the partial product of bits $a_{m}$ by $b_{n}$ has weight $2^{(m+n)}$

In the second step, the resulting bits are reduced to two numbers; this is accomplished as follows: As long as there are three or more wires with the same weight add a following layer:-

Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires.
If there are two wires of the same weight left, input them into a half adder.
If there is just one wire left, connect it to the next layer.

In the third and final step, the two resulting numbers are fed to an adder, obtaining the final product.

Example

$n=4$ , multiplying $a_{3}a_{2}a_{1}a_{0}$ by $b_{3}b_{2}b_{1}b_{0}$ :

First we multiply every bit by every bit:
- weight 1 – $a_{0}b_{0}$
- weight 2 – $a_{0}b_{1}$ , $a_{1}b_{0}$
- weight 4 – $a_{0}b_{2}$ , $a_{1}b_{1}$ , $a_{2}b_{0}$
- weight 8 – $a_{0}b_{3}$ , $a_{1}b_{2}$ , $a_{2}b_{1}$ , $a_{3}b_{0}$
- weight 16 – $a_{1}b_{3}$ , $a_{2}b_{2}$ , $a_{3}b_{1}$
- weight 32 – $a_{2}b_{3}$ , $a_{3}b_{2}$
- weight 64 – $a_{3}b_{3}$
Reduction layer 1:
- Pass the only weight-1 wire through, output: 1 weight-1 wire
- Add a half adder for weight 2, outputs: 1 weight-2 wire, 1 weight-4 wire
- Add a full adder for weight 4, outputs: 1 weight-4 wire, 1 weight-8 wire
- Add a full adder for weight 8, and pass the remaining wire through, outputs: 2 weight-8 wires, 1 weight-16 wire
- Add a full adder for weight 16, outputs: 1 weight-16 wire, 1 weight-32 wire
- Add a half adder for weight 32, outputs: 1 weight-32 wire, 1 weight-64 wire
- Pass the only weight-64 wire through, output: 1 weight-64 wire
Wires at the output of reduction layer 1:MESCOE
- weight 1 – 1
- weight 2 – 1
- weight 4 – 2
- weight 8 – 3
- weight 16 – 2
- weight 32 – 2
- weight 64 – 2
Reduction layer 2:
- Add a full adder for weight 8, and half adders for weights 4, 16, 32, 64
Outputs:
- weight 1 – 1
- weight 2 – 1
- weight 4 – 1
- weight 8 – 2
- weight 16 – 2
- weight 32 – 2
- weight 64 – 2
- weight 128 – 1
Group the wires into a pair of integers and an adder to add them.

Related Research Articles

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

In computational complexity theory, the class NC (for "Nick's Class") is the set of decision problems decidable in polylogarithmic time on a parallel computer with a polynomial number of processors. In other words, a problem with input size n is in NC if there exist constants c and k such that it can be solved in time $O ((log n) c)$ using $O (n k)$ parallel processors. Stephen Cook coined the name "Nick's class" after Nick Pippenger, who had done extensive research on circuits with polylogarithmic depth and polynomial size.

A multiplication algorithm is an algorithm to multiply two numbers. Depending on the size of the numbers, different algorithms are more efficient than others. Efficient multiplication algorithms have existed since the advent of the decimal numeral system.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

A numerically controlled oscillator (NCO) is a digital signal generator which creates a synchronous, discrete-time, discrete-valued representation of a waveform, usually sinusoidal. NCOs are often used in conjunction with a digital-to-analog converter (DAC) at the output to create a direct digital synthesizer (DDS).

A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the use of any sequential logic, only pure combinational logic, i.e. it inherently provides a binary operation. It can however in theory also be used to implement unary operations, such as logical shift left, in cases where limited by a fixed amount. One way to implement a barrel shifter is as a sequence of multiplexers where the output of one multiplexer is connected to the input of the next multiplexer in a way that depends on the shift distance. A barrel shifter is often used to shift and rotate n-bits in modern microprocessors, typically within a single clock cycle.

An adder, or summer, is a digital circuit that performs addition of numbers. In many computers and other kinds of processors, adders are used in the arithmetic logic units (ALUs). They are also used in other parts of the processor, where they are used to calculate addresses, table indices, increment and decrement operators and similar operations.

In Boolean algebra, any Boolean function can be expressed in the canonical disjunctive normal form (CDNF) or minterm canonical form, and its dual, the canonical conjunctive normal form (CCNF) or maxterm canonical form. Other canonical forms include the complete sum of prime implicants or Blake canonical form, and the algebraic normal form.

In modular arithmetic computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication. It was introduced in 1985 by the American mathematician Peter L. Montgomery.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth's algorithm is of interest in the study of computer architecture.

A carry-skip adder is an adder implementation that improves on the delay of a ripple-carry adder with little effort compared to other adders. The improvement of the worst-case delay is achieved by using several carry-skip adders to form a block-carry-skip adder.

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

The Dadda multiplier is a hardware binary multiplier design invented by computer scientist Luigi Dadda in 1965. It uses a selection of full and half adders to sum the partial products in stages until two numbers are left. The design is similar to the Wallace multiplier, but the different reduction tree reduces the required number of gates and makes it slightly faster.

A carry-save adder is a type of digital adder, used to efficiently compute the sum of three or more binary numbers. It differs from other digital adders in that it outputs two numbers, and the answer of the original summation can be achieved by adding these outputs together. A carry save adder is typically used in a binary multiplier, since a binary multiplier involves addition of more than two binary numbers after multiplication. A big adder implemented using this technique will usually be much faster than conventional addition of those numbers.

A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers.

The softmax function, also known as softargmax or normalized exponential function, converts a vector of $K$ real numbers into a probability distribution of $K$ possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luce's choice axiom.

The Karatsuba algorithm is a fast multiplication algorithm. It was discovered by Anatoly Karatsuba in 1960 and published in 1962. It is a divide-and-conquer algorithm that reduces the multiplication of two n-digit numbers to three multiplications of n/2-digit numbers and, by repeating this reduction, to at most $single-digit multiplications. It is therefore asymptotically faster than the traditional algorithm, which performs single-digit products.$

In computer science, multiply-with-carry (MWC) is a method invented by George Marsaglia for generating sequences of random integers based on an initial set from two to many thousands of randomly chosen seed values. The main advantages of the MWC method are that it invokes simple computer integer arithmetic and leads to very fast generation of sequences of random numbers with immense periods, ranging from around $to .$

In cryptography, SWIFFT is a collection of provably secure hash functions. It is based on the concept of the fast Fourier transform (FFT). SWIFFT is not the first hash function based on FFT, but it sets itself apart by providing a mathematical proof of its security. It also uses the LLL basis reduction algorithm. It can be shown that finding collisions in SWIFFT is at least as difficult as finding short vectors in cyclic/ideal lattices in the worst case. By giving a security reduction to the worst-case scenario of a difficult mathematical problem, SWIFFT gives a much stronger security guarantee than most other cryptographic hash functions.

References

↑ Townsend, Whitney J.; Swartzlander, Earl E.; Abraham, Jacob A. (2003). Luk, Franklin T. (ed.). "A comparison of Dadda and Wallace multiplier delays". Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. 5205: 552–560. Bibcode:2003SPIE.5205..552T. doi:10.1117/12.507012. ISSN 0277-786X. S2CID 121437680.
↑ Wallace, Christopher Stewart (February 1964). "A suggestion for a fast multiplier". IEEE Transactions on Electronic Computers . EC-13 (1): 14–17. doi:10.1109/PGEC.1964.263830. S2CID 34688264.
↑ Bohsali, Mounir; Doan, Michael (2010). "Rectangular Styled Wallace Tree Multipliers" (PDF). Archived from the original (PDF) on 2010-02-15.
↑ "Introduction". 8x8 Booth Encoded Wallace-tree multiplier. Tufts university. 2007. Archived from the original on 2010-06-17.
↑ Weems Jr., Charles C. (2001) [1995]. "CmpSci 535 Discussion 7: Number Representations". Amherst: University of Massachusetts. Archived from the original on 2011-02-06.

External links

Generic VHDL Implementation of Wallace Tree Multiplier.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Townsend, Whitney J.; Swartzlander, Earl E.; Abraham, Jacob A. (2003). Luk, Franklin T. (ed.). "A comparison of Dadda and Wallace multiplier delays". Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. 5205: 552–560. Bibcode:2003SPIE.5205..552T. doi:10.1117/12.507012. ISSN 0277-786X. S2CID 121437680.

[Wallace_1964-2] Wallace, Christopher Stewart (February 1964). "A suggestion for a fast multiplier". IEEE Transactions on Electronic Computers . EC-13 (1): 14–17. doi:10.1109/PGEC.1964.263830. S2CID 34688264.

[Bohsali_2010-3] Bohsali, Mounir; Doan, Michael (2010). "Rectangular Styled Wallace Tree Multipliers" (PDF). Archived from the original (PDF) on 2010-02-15.

[tufts_2007-4] "Introduction". 8x8 Booth Encoded Wallace-tree multiplier. Tufts university. 2007. Archived from the original on 2010-06-17.

[Weems_2001-5] Weems Jr., Charles C. (2001) [1995]. "CmpSci 535 Discussion 7: Number Representations". Amherst: University of Massachusetts. Archived from the original on 2011-02-06.

[1]

[2]

[3]

[4]

[5]

Wallace tree

Contents

Detailed explanation

Example

See also

Related Research Articles

References

Further reading

External links