Toom–Cook multiplication

Last updated

Toom–Cook, sometimes known as Toom-3, named after Andrei Toom, who introduced the new algorithm with its low complexity, and Stephen Cook, who cleaned the description of it, is a multiplication algorithm for large integers.

Contents

Given two large integers, a and b, Toom–Cook splits up a and b into k smaller parts each of length l, and performs operations on the parts. As k grows, one may combine many of the multiplication sub-operations, thus reducing the overall complexity of the algorithm. The multiplication sub-operations can then be computed recursively using Toom–Cook multiplication again, and so on. Although the terms "Toom-3" and "Toom–Cook" are sometimes incorrectly used interchangeably, Toom-3 is only a single instance of the Toom–Cook algorithm, where k = 3.

Toom-3 reduces 9 multiplications to 5, and runs in Θ(nlog(5)/log(3)) ≈ Θ(n1.46). In general, Toom-k runs in Θ(c(k) ne), where e = log(2k − 1) / log(k), ne is the time spent on sub-multiplications, and c is the time spent on additions and multiplication by small constants. [1] The Karatsuba algorithm is a special case of Toom–Cook, where the number is split into two smaller ones. It reduces 4 multiplications to 3 and so operates at Θ(nlog(3)/log(2)) ≈ Θ(n1.58). Ordinary long multiplication is equivalent to Toom-1, with complexity Θ(n2).

Although the exponent e can be set arbitrarily close to 1 by increasing k, the function c unfortunately grows very rapidly. [1] [2] The growth rate for mixed-level Toom–Cook schemes was still an open research problem in 2005. [3] An implementation described by Donald Knuth achieves the time complexity Θ(n 22 log n log n). [4]

Due to its overhead, Toom–Cook is slower than long multiplication with small numbers, and it is therefore typically used for intermediate-size multiplications, before the asymptotically faster Schönhage–Strassen algorithm (with complexity Θ(n log n log log n)) becomes practical.

Toom first described this algorithm in 1963, and Cook published an improved (asymptotically equivalent) algorithm in his PhD thesis in 1966. [5]

Details

This section discusses exactly how to perform Toom-k for any given value of k, and is a simplification of a description of Toom–Cook polynomial multiplication described by Marco Bodrato. [6] The algorithm has five main steps:

  1. Splitting
  2. Evaluation
  3. Pointwise multiplication
  4. Interpolation
  5. Recomposition

In a typical large integer implementation, each integer is represented as a sequence of digits in positional notation, with the base or radix set to some (typically large) value b; for this example we use b = 10000, so that each digit corresponds to a group of four decimal digits (in a computer implementation, b would typically be a power of 2 instead). Say the two integers being multiplied are:

m=1234567890123456789012
n=987654321987654321098.

These are much smaller than would normally be processed with Toom–Cook (grade-school multiplication would be faster) but they will serve to illustrate the algorithm.

Splitting

The first step is to select the base B = bi, such that the number of digits of both m and n in base B is at most k (e.g., 3 in Toom-3). A typical choice for i is given by:

In our example we'll be doing Toom-3, so we choose B = b2 = 108. We then separate m and n into their base B digits mi, ni:

We then use these digits as coefficients in degree-(k − 1) polynomials p and q, with the property that p(B) = m and q(B) = n:

The purpose of defining these polynomials is that if we can compute their product r(x) = p(x)q(x), our answer will be r(B) = m × n.

In the case where the numbers being multiplied are of different sizes, it's useful to use different values of k for m and n, which we'll call km and kn. For example, the algorithm "Toom-2.5" refers to Toom–Cook with km = 3 and kn = 2. In this case the i in B = bi is typically chosen by:

Evaluation

The Toom–Cook approach to computing the polynomial product p(x)q(x) is a commonly used one. Note that a polynomial of degree d is uniquely determined by d + 1 points (for example, a line - polynomial of degree one is specified by two points). The idea is to evaluate p(·) and q(·) at various points. Then multiply their values at these points to get points on the product polynomial. Finally interpolate to find its coefficients.

Since deg(pq) = deg(p) + deg(q), we will need deg(p) + deg(q) + 1 = km + kn − 1 points to determine the final result. Call this d. In the case of Toom-3, d = 5. The algorithm will work no matter what points are chosen (with a few small exceptions, see matrix invertibility requirement in Interpolation), but in the interest of simplifying the algorithm it's better to choose small integer values like 0, 1, −1, and −2.

One unusual point value that is frequently used is infinity, written ∞ or 1/0. To "evaluate" a polynomial p at infinity actually means to take the limit of p(x)/xdeg p as x goes to infinity. Consequently, p(∞) is always the value of its highest-degree coefficient (in the example above coefficient m2).

In our Toom-3 example, we will use the points 0, 1, −1, −2, and ∞. These choices simplify evaluation, producing the formulas:

and analogously for q. In our example, the values we get are:

p(0)=m0=56789012=56789012
p(1)=m0 + m1 + m2=56789012 + 78901234 + 123456=135813702
p(−1)=m0m1 + m2=56789012 − 78901234 + 123456=−21988766
p(−2)=m0 − 2m1 + 4m2=56789012 − 2 × 78901234 + 4 × 123456=−100519632
p(∞)=m2=123456=123456
q(0)=n0=54321098=54321098
q(1)=n0 + n1 + n2=54321098 + 43219876 + 98765=97639739
q(−1)=n0n1 + n2=54321098 − 43219876 + 98765=11199987
q(−2)=n0 − 2n1 + 4n2=54321098 − 2 × 43219876 + 4 × 98765=−31723594
q(∞)=n2=98765=98765.

As shown, these values may be negative.

For the purpose of later explanation, it will be useful to view this evaluation process as a matrix-vector multiplication, where each row of the matrix contains powers of one of the evaluation points, and the vector contains the coefficients of the polynomial:

The dimensions of the matrix are d by km for p and d by kn for q. The row for infinity is always all zero except for a 1 in the last column.

Faster evaluation

Multipoint evaluation can be obtained faster than with the above formulas. The number of elementary operations (addition/subtraction) can be reduced. The sequence given by Bodrato [6] for Toom-3, executed here over the first operand (polynomial p) of the running example is the following:

p0m0 + m2=56789012 + 123456=56912468
p(0)=m0=56789012=56789012
p(1)=p0 + m1=56912468 + 78901234=135813702
p(−1)=p0m1=56912468 − 78901234=−21988766
p(−2)=(p(−1) + m2) × 2 − m0=(− 21988766 + 123456 ) × 2 − 56789012=− 100519632
p(∞)=m2=123456=123456.

This sequence requires five addition/subtraction operations, one less than the straightforward evaluation. Moreover the multiplication by 4 in the calculation of p(−2) was saved.

Pointwise multiplication

Unlike multiplying the polynomials p(·) and q(·), multiplying the evaluated values p(a) and q(a) just involves multiplying integers a smaller instance of the original problem. We recursively invoke our multiplication procedure to multiply each pair of evaluated points. In practical implementations, as the operands become smaller, the algorithm will switch to schoolbook long multiplication. Letting r be the product polynomial, in our example we have:

r(0)=p(0)q(0)=56789012 × 54321098=3084841486175176
r(1)=p(1)q(1)=135813702 × 97639739=13260814415903778
r(−1)=p(−1)q(−1)=−21988766 × 11199987=−246273893346042
r(−2)=p(−2)q(−2)=−100519632 × −31723594=3188843994597408
r(∞)=p(∞)q(∞)=123456 × 98765=12193131840.

As shown, these can also be negative. For large enough numbers, this is the most expensive step, the only step that is not linear in the sizes of m and n.

Interpolation

This is the most complex step, the reverse of the evaluation step: given our d points on the product polynomial r(·), we need to determine its coefficients. In other words, we want to solve this matrix equation for the vector on the right-hand side:

This matrix is constructed the same way as the one in the evaluation step, except that it's d × d. We could solve this equation with a technique like Gaussian elimination, but this is too expensive. Instead, we use the fact that, provided the evaluation points were chosen suitably, this matrix is invertible (see also Vandermonde matrix), and so:

All that remains is to compute this matrix-vector product. Although the matrix contains fractions, the resulting coefficients will be integers so this can all be done with integer arithmetic, just additions, subtractions, and multiplication/division by small constants. A difficult design challenge in Toom–Cook is to find an efficient sequence of operations to compute this product; one sequence given by Bodrato [6] for Toom-3 is the following, executed here over the running example:

r0r(0)=3084841486175176
r4r(∞)=12193131840
r3(r(−2) − r(1))/3=(3188843994597408 − 13260814415903778)/3
=−3357323473768790
r1(r(1) − r(−1))/2=(13260814415903778 − (−246273893346042))/2
=6753544154624910
r2r(−1) − r(0)=−246273893346042 − 3084841486175176
=−3331115379521218
r3(r2r3)/2 + 2r(∞)=(−3331115379521218 − (−3357323473768790))/2 + 2 × 12193131840
=13128433387466
r2r2 + r1r4=−3331115379521218 + 6753544154624910 − 12193131840
=3422416581971852
r1r1r3=6753544154624910 − 13128433387466
=6740415721237444.

We now know our product polynomial r:

If we were using different km, kn, or evaluation points, the matrix and so our interpolation strategy would change; but it does not depend on the inputs and so can be hard-coded for any given set of parameters.

Recomposition

Finally, we evaluate r(B) to obtain our final answer. This is straightforward since B is a power of b and so the multiplications by powers of B are all shifts by a whole number of digits in base b. In the running example b = 104 and B = b2 = 108.

3084841486175176
6740415721237444
3422416581971852
13128433387466
+12193131840

1219326312467611632493760095208585886175176

And this is in fact the product of 1234567890123456789012 and 987654321987654321098.

Interpolation matrices for various k

Here we give common interpolation matrices for a few different common small values of km and kn.

Toom-1

Toom-1 (km = kn = 1) requires 1 evaluation point, here chosen to be 0. It degenerates to long multiplication, with an interpolation matrix of the identity matrix:

Toom-1.5

Toom-1.5 (km = 2, kn = 1) requires 2 evaluation points, here chosen to be 0 and ∞. Its interpolation matrix is then the identity matrix:

This also degenerates to long multiplication: both coefficients of one factor are multipled by the sole coefficient of the other factor.

Toom-2

Toom-2 (km = 2, kn = 2) requires 3 evaluation points, here chosen to be 0, 1, and ∞. It is the same as Karatsuba multiplication, with an interpolation matrix of:

Toom-2.5

Toom-2.5 (km = 3, kn = 2) requires 4 evaluation points, here chosen to be 0, 1, −1, and ∞. It then has an interpolation matrix of:

Notes

  1. 1 2 Knuth, p. 296
  2. Crandall & Pomerance, p. 474
  3. Crandall & Pomerance, p. 536
  4. Knuth, p. 302
  5. Positive Results, chapter III of Stephen A. Cook: On the Minimum Computation Time of Functions.
  6. 1 2 3 Marco Bodrato. Towards Optimal Toom–Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0. In WAIFI'07 proceedings, volume 4547 of LNCS, pages 116–133. June 21–22, 2007. author website

Related Research Articles

In mathematics and computer science, Horner's method is an algorithm for polynomial evaluation. Although named after William George Horner, this method is much older, as it has been attributed to Joseph-Louis Lagrange by Horner himself, and can be traced back many hundreds of years to Chinese and Persian mathematicians.

A multiplication algorithm is an algorithm to multiply two numbers. Depending on the size of the numbers, different algorithms are used. Efficient multiplication algorithms have existed since the advent of the decimal system.

Grover's algorithm is a quantum algorithm that finds with high probability the unique input to a black box function that produces a particular output value, using just evaluations of the function, where is the size of the function's domain. It was devised by Lov Grover in 1996.

In arithmetic and computer programming, the extended Euclidean algorithm is an extension to the Euclidean algorithm, and computes, in addition to the greatest common divisor (gcd) of integers a and b, also the coefficients of Bézout's identity, which are integers x and y such that

Matrix multiplication Mathematical operation in linear algebra

In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices and is then denoted simply as .

Cayley–Hamilton theorem Every square matrix over a commutative ring satisfies its own characteristic equation

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In numerical analysis, polynomial interpolation is the interpolation of a given data set by the polynomial of lowest possible degree that passes through the points of the dataset.

In linear algebra, an n-by-n square matrix A is called invertible, if there exists an n-by-n square matrix B such that

Bruun's algorithm is a fast Fourier transform (FFT) algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996. Because its operations involve only real coefficients until the last computation stage, it was initially proposed as a way to efficiently compute the discrete Fourier transform (DFT) of real data. Bruun's algorithm has not seen widespread use, however, as approaches based on the ordinary Cooley–Tukey FFT algorithm have been successfully adapted to real data with at least as much efficiency. Furthermore, there is evidence that Bruun's algorithm may be intrinsically less accurate than Cooley–Tukey in the face of finite numerical precision.

In mathematics, the determinant of a skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depend on the size of the matrix. The value of this polynomial, when applied to the coefficients of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley (1852) who indirectly named them after Johann Friedrich Pfaff. The Pfaffian is nonvanishing only for 2n × 2n skew-symmetric matrices, in which case it is a polynomial of degree n.

In probability theory, the Chernoff bound, named after Herman Chernoff but due to Herman Rubin, gives exponentially decreasing bounds on tail distributions of sums of independent random variables. It is a sharper bound than the known first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent – a condition that neither Markov's inequality nor Chebyshev's inequality require, although Chebyshev's inequality does require the variates to be pairwise independent.

In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm and is useful in practice for large matrices, but would be slower than the fastest known algorithms for extremely large matrices.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

Karatsuba algorithm Algorithm for integer multiplication

The Karatsuba algorithm is a fast multiplication algorithm. It was discovered by Anatoly Karatsuba in 1960 and published in 1962. It reduces the multiplication of two n-digit numbers to at most single-digit multiplications in general (and exactly when n is a power of 2). It is therefore faster than the traditional algorithm, which requires single-digit products. For example, the Karatsuba algorithm requires 310 = 59,049 single-digit multiplications to multiply two 1024-digit numbers (n = 1024 = 210), whereas the traditional algorithm requires (210)2 = 1,048,576 (a speedup of 17.75 times).

In algebra, the greatest common divisor of two polynomials is a polynomial, of the highest possible degree, that is a factor of both the two original polynomials. This concept is analogous to the greatest common divisor of two integers.

In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way.

In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them.

In 1997, Moni Naor and Omer Reingold described efficient constructions for various cryptographic primitives in private key as well as public-key cryptography. Their result is the construction of an efficient pseudorandom function. Let p and l be prime numbers with l |p−1. Select an element g of multiplicative order l. Then for each n-dimensional vector a = ∈ they define the function

The cyclotomic fast Fourier transform is a type of fast Fourier transform algorithm over finite fields. This algorithm first decomposes a DFT into several circular convolutions, and then derives the DFT results from the circular convolution results. When applied to a DFT over , this algorithm has a very low multiplicative complexity. In practice, since there usually exist efficient algorithms for circular convolutions with specific lengths, this algorithm is very efficient.

References