Strassen algorithm

Last updated

In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices. The Strassen algorithm is slower than the fastest known algorithms for extremely large matrices, but such galactic algorithms are not useful in practice, as they are much slower for matrices of practical size. For small matrices even faster algorithms exist.

Contents

Strassen's algorithm works for any ring, such as plus/multiply, but not all semirings, such as min-plus or boolean algebra, where the naive algorithm still works, and so called combinatorial matrix multiplication.

History

Volker Strassen first published this algorithm in 1969 and thereby proved that the general matrix multiplication algorithm was not optimal. [1] The Strassen algorithm's publication resulted in more research about matrix multiplication that led to both asymptotically lower bounds and improved computational upper bounds.

Algorithm

The left column visualizes the calculations necessary to determine the result of a 2x2 matrix multiplication. Naive matrix multiplication requires one multiplication for each "1" of the left column. Each of the other columns (M1-M7) represents a single one of the 7 multiplications in the Strassen algorithm. The sum of the columns M1-M7 gives the same result as the full matrix multiplication on the left. Strassen algorithm.svg
The left column visualizes the calculations necessary to determine the result of a 2x2 matrix multiplication. Naïve matrix multiplication requires one multiplication for each "1" of the left column. Each of the other columns (M1-M7) represents a single one of the 7 multiplications in the Strassen algorithm. The sum of the columns M1-M7 gives the same result as the full matrix multiplication on the left.

Let , be two square matrices over a ring , for example matrices whose entries are integers or the real numbers. The goal of matrix multiplication is to calculate the matrix product . The following exposition of the algorithm assumes that all of these matrices have sizes that are powers of two (i.e., ), but this is only conceptually necessary — if the matrices , are not of type , the "missing" rows and columns can be filled with zeros to obtain matrices with sizes of powers of two — though real implementations of the algorithm do not do this in practice.

The Strassen algorithm partitions , and into equally sized block matrices

with . The naive algorithm would be:

This construction does not reduce the number of multiplications: 8 multiplications of matrix blocks are still needed to calculate the matrices, the same number of multiplications needed when using standard matrix multiplication.

The Strassen algorithm defines instead new values:

using only 7 multiplications (one for each ) instead of 8. We may now express the in terms of :

We recursively iterate this division process until the submatrices degenerate into numbers (elements of the ring ). If, as mentioned above, the original matrix had a size that was not a power of 2, then the resulting product will have zero rows and columns just like and , and these will then be stripped at this point to obtain the (smaller) matrix we really wanted.

Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations. [2] However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less). [3] A more recent study (2016) observed benefits for matrices as small as 512 and a benefit around 20%. [4]

Improvements to Strassen algorithm

It is possible to reduce the number of matrix additions by instead using the following form discovered by Winograd in 1971: [5]

where .

This reduces the number of matrix additions and subtractions from 18 to 15. The number of matrix multiplications is still 7, and the asymptotic complexity is the same. [6]

The algorithm was further optimised in 2017, [7] reducing the number of matrix additions per step to 12 while maintaining the number of matrix multiplications, and again in 2023: [8]

Asymptotic complexity

The outline of the algorithm above showed that one can get away with just 7, instead of the traditional 8, matrix-matrix multiplications for the sub-blocks of the matrix. On the other hand, one has to do additions and subtractions of blocks, though this is of no concern for the overall complexity: Adding matrices of size requires only operations whereas multiplication is substantially more expensive (traditionally addition or multiplication operations).

The question then is how many operations exactly one needs for Strassen's algorithms, and how this compares with the standard matrix multiplication that takes approximately (where ) arithmetic operations, i.e. an asymptotic complexity .

The number of additions and multiplications required in the Strassen algorithm can be calculated as follows: let be the number of operations for a matrix. Then by recursive application of the Strassen algorithm, we see that , for some constant that depends on the number of additions performed at each application of the algorithm. Hence , i.e., the asymptotic complexity for multiplying matrices of size using the Strassen algorithm is . The reduction in the number of arithmetic operations however comes at the price of a somewhat reduced numerical stability, [9] and the algorithm also requires significantly more memory compared to the naive algorithm. Both initial matrices must have their dimensions expanded to the next power of 2, which results in storing up to four times as many elements, and the seven auxiliary matrices each contain a quarter of the elements in the expanded ones.

Strassen's algorithm needs to be compared to the "naive" way of doing the matrix multiplication that would require 8 instead of 7 multiplications of sub-blocks. This would then give rise to the complexity one expects from the standard approach: . The comparison of these two algorithms shows that asymptotically, Strassen's algorithm is faster: There exists a size so that matrices that are larger are more efficiently multiplied with Strassen's algorithm than the "traditional" way. However, the asymptotic statement does not imply that Strassen's algorithm is always faster even for small matrices, and in practice this is in fact not the case: For small matrices, the cost of the additional additions of matrix blocks outweighs the savings in the number of multiplications. There are also other factors not captured by the analysis above, such as the difference in cost on today's hardware between loading data from memory onto processors vs. the cost of actually doing operations on this data. As a consequence of these sorts of considerations, Strassen's algorithm is typically only used on "large" matrices. This kind of effect is even more pronounced with alternative algorithms such as the one by Coppersmith and Winograd: While asymptotically even faster, the cross-over point is so large that the algorithm is not generally used on matrices one encounters in practice.

Rank or bilinear complexity

The bilinear complexity or rank of a bilinear map is an important concept in the asymptotic complexity of matrix multiplication. The rank of a bilinear map over a field F is defined as (somewhat of an abuse of notation)

In other words, the rank of a bilinear map is the length of its shortest bilinear computation. [10] The existence of Strassen's algorithm shows that the rank of matrix multiplication is no more than seven. To see this, let us express this algorithm (alongside the standard algorithm) as such a bilinear computation. In the case of matrices, the dual spaces A* and B* consist of maps into the field F induced by a scalar double-dot product , (i.e. in this case the sum of all the entries of a Hadamard product.)

Standard algorithmStrassen algorithm
1
2
3
4
5
6
7
8

It can be shown that the total number of elementary multiplications required for matrix multiplication is tightly asymptotically bound to the rank , i.e. , or more specifically, since the constants are known, . One useful property of the rank is that it is submultiplicative for tensor products, and this enables one to show that matrix multiplication can be accomplished with no more than elementary multiplications for any . (This -fold tensor product of the matrix multiplication map with itself an -th tensor poweris realized by the recursive step in the algorithm shown.)

Cache behavior

Strassen's algorithm is cache oblivious. Analysis of its cache behavior algorithm has shown it to incur

cache misses during its execution, assuming an idealized cache of size (i.e. with lines of length ). [11] :13

Implementation considerations

The description above states that the matrices are square, and the size is a power of two, and that padding should be used if needed. This restriction allows the matrices to be split in half, recursively, until limit of scalar multiplication is reached. The restriction simplifies the explanation, and analysis of complexity, but is not actually necessary; [12] and in fact, padding the matrix as described will increase the computation time and can easily eliminate the fairly narrow time savings obtained by using the method in the first place.

A good implementation will observe the following:

Furthermore, there is no need for the matrices to be square. Non-square matrices can be split in half using the same methods, yielding smaller non-square matrices. If the matrices are sufficiently non-square it will be worthwhile reducing the initial operation to more square products, using simple methods which are essentially , for instance:

These techniques will make the implementation more complicated, compared to simply padding to a power-of-two square; however, it is a reasonable assumption that anyone undertaking an implementation of Strassen, rather than conventional multiplication, will place a higher priority on computational efficiency than on simplicity of the implementation.

In practice, Strassen's algorithm can be implemented to attain better performance than conventional multiplication even for matrices as small as , for matrices that are not at all square, and without requiring workspace beyond buffers that are already needed for a high-performance conventional multiplication. [4]

See also

Related Research Articles

In mathematics, the determinant is a scalar-valued function of the entries of a square matrix. The determinant of a matrix A is commonly denoted det(A), det A, or |A|. Its value characterizes some properties of the matrix and the linear map represented, on a given basis, by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the corresponding linear map is an isomorphism.

<span class="mw-page-title-main">Gaussian elimination</span> Algorithm for solving systems of linear equations

In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of row-wise operations performed on the corresponding matrix of coefficients. This method can also be used to compute the rank of a matrix, the determinant of a square matrix, and the inverse of an invertible matrix. The method is named after Carl Friedrich Gauss (1777–1855). To perform row reduction on a matrix, one uses a sequence of elementary row operations to modify the matrix until the lower left-hand corner of the matrix is filled with zeros, as much as possible. There are three types of elementary row operations:

<span class="mw-page-title-main">Matrix multiplication</span> Mathematical operation in linear algebra

In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices A and B is denoted as AB.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition.

In linear algebra, a Toeplitz matrix or diagonal-constant matrix, named after Otto Toeplitz, is a matrix in which each descending diagonal from left to right is constant. For instance, the following matrix is a Toeplitz matrix:

In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknowns, valid whenever the system has a unique solution. It expresses the solution in terms of the determinants of the (square) coefficient matrix and of matrices obtained from it by replacing one column by the column vector of right-sides of the equations. It is named after Gabriel Cramer, who published the rule for an arbitrary number of unknowns in 1750, although Colin Maclaurin also published special cases of the rule in 1748, and possibly knew of it as early as 1729.

In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is , while an example of a 3×3 diagonal matrix is. An identity matrix of any size, or any multiple of it is a diagonal matrix called a scalar matrix, for example, . In geometry, a diagonal matrix may be used as a scaling matrix, since matrix multiplication with it results in changing scale (size) and possibly also shape; only a scalar matrix results in uniform change in scale.

In linear algebra, an invertible matrix is a square matrix which has an inverse. In other words, if some other matrix is multiplied by the invertible matrix, the result can be multiplied by an inverse to undo the operation. An invertible matrix multiplied by its inverse yields the identity matrix. Invertible matrices are the same size as their inverse.

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

<span class="mw-page-title-main">Block matrix</span> Matrix defined using smaller matrices called blocks

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the part of the domain which is mapped to the zero vector of the co-domain; the kernel is always a linear subspace of the domain. That is, given a linear map L : VW between two vector spaces V and W, the kernel of L is the vector space of all elements v of V such that L(v) = 0, where 0 denotes the zero vector in W, or more symbolically:

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish astronomer Tadeusz Banachiewicz in 1938. To quote: "It appears that Gauss and Doolittle applied the method [of elimination] only to symmetric equations. More recent authors, for example, Aitken, Banachiewicz, Dwyer, and Crout … have emphasized the use of the method, or variations of it, in connection with non-symmetric problems … Banachiewicz … saw the point … that the basic problem is really one of matrix factorization, or “decomposition” as he called it." It is also sometimes referred to as LR decomposition.

In mathematics, especially in linear algebra and matrix theory, the vectorization of a matrix is a linear transformation which converts the matrix into a vector. Specifically, the vectorization of a m × n matrix A, denoted vec(A), is the mn × 1 column vector obtained by stacking the columns of the matrix A on top of one another: Here, represents the element in the i-th row and j-th column of A, and the superscript denotes the transpose. Vectorization expresses, through coordinates, the isomorphism between these (i.e., of matrices and vectors) as vector spaces.

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, with elements or entries arranged in rows and columns, which is used to represent a mathematical object or property of such an object.

Because matrix multiplication is such a central operation in many numerical algorithms, much work has been invested in making matrix multiplication algorithms efficient. Applications of matrix multiplication in computational problems are found in many fields including scientific computing and pattern recognition and in seemingly unrelated problems such as counting the paths through a graph. Many different algorithms have been designed for multiplying matrices on different types of hardware, including parallel and distributed systems, where the computational work is spread over multiple processors.

In theoretical computer science, the computational complexity of matrix multiplication dictates how quickly the operation of matrix multiplication can be performed. Matrix multiplication algorithms are a central subroutine in theoretical and numerical algorithms for numerical linear algebra and optimization, so finding the fastest algorithm for matrix multiplication is of major practical relevance.

<span class="mw-page-title-main">Hadamard product (matrices)</span> Elementwise product of two matrices

In mathematics, the Hadamard product is a binary operation that takes in two matrices of the same dimensions and returns a matrix of the multiplied corresponding elements. This operation can be thought as a "naive matrix multiplication" and is different from the matrix product. It is attributed to, and named after, either French mathematician Jacques Hadamard or German mathematician Issai Schur.

References

  1. Strassen, Volker (1969). "Gaussian Elimination is not Optimal". Numer. Math. 13 (4): 354–356. doi:10.1007/BF02165411. S2CID   121656251.
  2. Skiena, Steven S. (1998), "§8.2.3 Matrix multiplication", The Algorithm Design Manual, Berlin, New York: Springer-Verlag, ISBN   978-0-387-94860-7 .
  3. 1 2 D'Alberto, Paolo; Nicolau, Alexandru (2005). Using Recursion to Boost ATLAS's Performance (PDF). Sixth Int'l Symp. on High Performance Computing.
  4. 1 2 Huang, Jianyu; Smith, Tyler M.; Henry, Greg M.; van de Geijn, Robert A. (13 Nov 2016). Strassen's Algorithm Reloaded. SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press. pp. 690–701. doi:10.1109/SC.2016.58. ISBN   9781467388153 . Retrieved 1 Nov 2022.
  5. Winograd, S. (October 1971). "On multiplication of 2 × 2 matrices". Linear Algebra and Its Applications. 4 (4): 381–388. doi:10.1016/0024-3795(71)90009-7.
  6. Knuth (1997), p. 500.
  7. Karstadt, Elaye; Schwartz, Oded (2017-07-24). "Matrix Multiplication, a Little Faster". Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. ACM. pp. 101–110. doi:10.1145/3087556.3087579. ISBN   978-1-4503-4593-4.
  8. Schwartz, Oded; Vaknin, Noa (2023-12-31). "Pebbling Game and Alternative Basis for High Performance Matrix Multiplication". SIAM Journal on Scientific Computing. 45 (6): C277 –C303. Bibcode:2023SJSC...45C.277S. doi:10.1137/22M1502719. ISSN   1064-8275.
  9. Webb, Miller (1975). "Computational complexity and numerical stability". SIAM J. Comput. 4 (2): 97–107. doi:10.1137/0204009.
  10. Burgisser; Clausen; Shokrollahi (1997). Algebraic Complexity Theory. Springer-Verlag. ISBN   3-540-60582-7.
  11. Frigo, M.; Leiserson, C. E.; Prokop, H.; Ramachandran, S. (1999). Cache-oblivious algorithms (PDF). Proc. IEEE Symp. on Foundations of Computer Science (FOCS). pp. 285–297.
  12. Higham, Nicholas J. (1990). "Exploiting fast matrix multiplication within the level 3 BLAS" (PDF). ACM Transactions on Mathematical Software. 16 (4): 352–368. doi:10.1145/98267.98290. hdl: 1813/6900 . S2CID   5715053.