Operation reduction for low power

Last updated May 24, 2023

Operation Reduction for Low Power is an ASIC program transformation technique used to reduce the power consumed by a specific application. A program transformation is any operation that changes the computational structure such as nature and type of computational models, their interconnections, sequencing of operations keeping the input output behavior intact. We basically use Operation reduction to reduce the number of operations to be done to perform a task which reduces the hardware required and in turn power consumption. For example, in a given Application specific IC reducing the number of independent additions required automatically reduces the adders required and also the power consumed.

Operation substitution

Operation Substitution is one of the operation reduction techniques where certain costly operations are substituted by relatively cheaper operations which reduce power consumption. Some typical examples of operation substitution techniques are given as follows:

Multiplication by Adds/Subtracts: The multiplication of two numbers if costly compared to addition of two numbers therefore substituting it with addition is profitable. For example, to calculate y = x² + Ax + B we can calculate x², Ax, and add both of them to B which has 2 multiplications, 3 additions or we can convert it into y = x(x+A) + B where we can calculate x+A multiply it with x and add B where we have 1 multiplication and 2 additions, both approaches have same critical path length but 2nd one has lesser multiplications which saves power.
Computation of Sine/cosine/tan: Computing trigonometric functions might also turn out to be quite costly where as substituting them with lesser order Taylor expansion makes them less power consuming but we may lose on approximation grounds which is a trade-off one should keep in mind.
Multiply-add by MAC: Multiply–accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware used for this purpose is called multiplier–accumulator (MAC). Using MAC's also decrease the power consumed. Basically a MAC does multiplication and addition in one unit. $\ a\leftarrow a+(b\times c)$
Reducing Memory Access: Changing the structure of the program by replacing the operations which require frequent memory access with those need less memory access is also profitable as memory access is a costly operation.

Butterfly example

A popular example of Operation substitution is Butterfly example. In this example we need to compute two values y_r = a_r * x_r - a_i * x_i, y_i = a_i * x_r + a_r * x_i which can be done sequentially computing the terms as shown in the expressions. But using operation substitution we can compute them using expressions, y_r = a_r* (x_i+x_r) - x_i * (a_i+a_r), y_i = a_r* (x_i+x_r) + x_r * (a_i-a_r) where the term (x_i+x_r) once computed can be used by both the computations from this we can easily workout that operations changed from number of operations changed from 4 multiplications to 3 and 2 Add/sub to 3. The critical path in the first method was of length 2 where as in the latter it is 3. So again this is a trade-off between delay and power.

Switching activity reduction

Based on the frequency of input changing we can model the program so that less activity switching happens i.e. if certain inputs are less frequently changing then they should be made operating in single module so that the particular module is relatively passive compared to others. A+B+C+D can be computed as (A+B)+C+D or (A+B)+(C+D) the first one feeds C,D to two separate adders but if they are relatively slow changing then feeding them to same adder is more profitable.

Power-aware scheduling and binding

Any synthesis has three parts Allocation (number and type of resources), Scheduling (operation scheduling), Binding (building the circuit). We can schedule the operations in a particular order based which value in the program activates how many modules. We always want the operations requiring more operations to be completed before hand to be scheduled later.

Exploiting mutual exclusion

Consider the following code snippet:

if (C > 0) {     :A = A * C }

Let us assume that the profiling has shown that most likely the value of C is 2. Therefore, as C and C are independent and mutually exclusive we can modify the code to be

if (c == 2) {     A = A << 1 } else if (C > 0) {     :A = A * C; }

Here the multiplication is replaced by shifting operation which is triggered in most of the cases and is far cheaper than multiplication.

Related Research Articles

In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.

In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used to compute the rank of a matrix, the determinant of a square matrix, and the inverse of an invertible matrix. The method is named after Carl Friedrich Gauss (1777–1855) although some special cases of the method—albeit presented without proof—were known to Chinese mathematicians as early as circa 179 AD.

Multiplication is one of the four elementary mathematical operations of arithmetic, with the other ones being addition, subtraction, and division. The result of a multiplication operation is called a product.

A multiplication algorithm is an algorithm to multiply two numbers. Depending on the size of the numbers, different algorithms are more efficient than others. Efficient multiplication algorithms have existed since the advent of the decimal system.

<span class="mw-page-title-main">Addition</span> Arithmetic operation

Addition is one of the four basic operations of arithmetic, the other three being subtraction, multiplication and division. The addition of two whole numbers results in the total amount or sum of those values combined. The example in the adjacent image shows two columns of three apples and two apples each, totaling at five apples. This observation is equivalent to the mathematical expression "3 + 2 = 5".

In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices $A$ and $B$ is denoted as $AB$ .

In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator ; the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator a:

In mathematics, finite field arithmetic is arithmetic in a finite field contrary to arithmetic in a field with an infinite number of elements, like the field of rational numbers.

Modular exponentiation is exponentiation performed over a modulus. It is useful in computer science, especially in the field of public-key cryptography, where it is used in both Diffie-Hellman Key Exchange and RSA public/private keys.

In modular arithmetic computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication. It was introduced in 1985 by the American mathematician Peter L. Montgomery.

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers, published by Arnold Schönhage and Volker Strassen in 1971. It works by recursively applying number-theoretic transforms over the integers modulo 2ⁿ+1. The run-time bit complexity to multiply two $n$ -digit numbers using the algorithm is $in big O notation .$

Matrix chain multiplication is an optimization problem concerning the most efficient way to multiply a given sequence of matrices. The problem is not actually to perform the multiplications, but merely to decide the sequence of the matrix multiplications involved. The problem may be solved using dynamic programming.

A carry-save adder is a type of digital adder, used to efficiently compute the sum of three or more binary numbers. It differs from other digital adders in that it outputs two numbers, and the answer of the original summation can be achieved by adding these outputs together. A carry save adder is typically used in a binary multiplier, since a binary multiplier involves addition of more than two binary numbers after multiplication. A big adder implemented using this technique will usually be much faster than conventional addition of those numbers.

A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers.

The Karatsuba algorithm is a fast multiplication algorithm. It was discovered by Anatoly Karatsuba in 1960 and published in 1962. It is a divide-and-conquer algorithm that reduces the multiplication of two n-digit numbers to three multiplications of n/2-digit numbers and, by repeating this reduction, to at most $single-digit multiplications. It is therefore asymptotically faster than the traditional algorithm, which performs single-digit products. For example, to multiply two 1024-digit numbers (n = 1024 = 2 10), the traditional algorithm requires (2 10) 2 = 1,048,576 single-digit multiplications, whereas the Karatsuba algorithm requires 3 10 = 59,049 thus being ~17.758 times faster.$

In mathematics, particularly in the area of arithmetic, a modular multiplicative inverse of an integer $a$ is an integer $x$ such that the product $ax$ is congruent to 1 with respect to the modulus $m$ . In the standard notation of modular arithmetic this congruence is written as

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.

Because matrix multiplication is such a central operation in many numerical algorithms, much work has been invested in making matrix multiplication algorithms efficient. Applications of matrix multiplication in computational problems are found in many fields including scientific computing and pattern recognition and in seemingly unrelated problems such as counting the paths through a graph. Many different algorithms have been designed for multiplying matrices on different types of hardware, including parallel and distributed systems, where the computational work is spread over multiple processors.

In theoretical computer science, the computational complexity of matrix multiplication dictates how quickly the operation of matrix multiplication can be performed. Matrix multiplication algorithms are a central subroutine in theoretical and numerical algorithms for numerical linear algebra and optimization, so finding the right amount of time it should take is of major practical relevance.

In mathematics and computer science, polynomial evaluation refers to computation of the value of a polynomial when its indeterminates are substituted for some values. In other words, evaluating the polynomial $at consists of computing See also Polynomial ring § Polynomial evaluation$

References

Chandrakasan, A.P. et al., Optimizing power using transformations, IEEE TCAD Vol 14, Jan. 1995, pp. 12–31
Low power design Lec-4 (https://www.youtube.com/watch?v=J56ExZ9uGkg)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.