Strassen’s \(2 \times 2\) matrix multiplication algorithm: a conceptual perspective
 76 Downloads
Abstract
The main purpose of this paper is pedagogical. Despite its importance, all proofs of the correctness of Strassen’s famous 1969 algorithm to multiply two \(2 \times 2\) matrices with only seven multiplications involve some basisdependent calculations such as explicitly multiplying specific \(2 \times 2\) matrices, expanding expressions to cancel terms with opposing signs, or expanding tensors over the standard basis, sometimes involving clever simplifications using the sparsity of tensor summands. This makes the proof nontrivial to memorize and many presentations of the proof avoid showing all the details and leave a significant amount of verifications to the reader. In this note we give a short, selfcontained, basisindependent proof of the existence of Strassen’s algorithm that avoids these types of calculations. We achieve this by focusing on symmetries and algebraic properties. Our proof can be seen as a coordinatefree version of the construction of Clausen from 1988, combined with recent work on the geometry of Strassen’s algorithm by Chiantini, Ikenmeyer, Landsberg, and Ottaviani from 2016.
Keywords
Matrix multiplication Strassen’s algorithm Coordinatefree ElementaryMathematics Subject Classification
68W30 Symbolic computation and algebraic computation1 Introduction
The discovery of Strassen’s matrix multiplication algorithm [28] was a breakthrough result in computational linear algebra. The study of fast (subcubic) matrix multiplication algorithms initiated by this discovery has become an important area of research (see [3] for a survey and [21] for the currently best upper bound on the complexity of matrix multiplication). Fast matrix multiplication has countless applications as a subroutine in algorithms for a wide variety of problems, see e.g. [7, §16] for numerous applications in computational linear algebra. In practice, algorithms more sophisticated than Strassen’s are rarely implemented, but Strassen’s algorithm is used for multiplication of large matrices (see [13, 19, 25] on practical fast matrix multiplication).
where \(u_k\) and \(v_k\) are cleverly chosen linear forms on the space of \(2 \times 2\) matrices and \(W_k\) are seven explicit \(2 \times 2\) matrices. Because of this structure it can be applied to block matrices, and its recursive application results in an algorithm for the multiplication of two \(n \times n\) matrices using \(O(n^{\log _2 7})\) arithmetic operations (see [7, §15.2] or [3] for details).
Because of the great importance of Strassen’s algorithm, our goal is to understand it on a deep level. In Strassen’s original paper, the linear forms \(u_k\), \(v_k\), and the matrices \(W_k\) are given, but the verification of the correctness of the algorithm is left to the reader. Unfortunately, such a description does not yield many further immediate insights.
Shortly after Strassen’s paper, Gastinel [15] published a proof of the existence of decomposition (\(\star \)) using simple algebraic transformations that is much easier to follow and verify. Many other papers provide alternative descriptions of Strassen’s algorithm or proofs of its existence. Brent [4] and Paterson [26] present the algorithm in a graphical form using \(4 \times 4\) diagrams indicating which elements of the two matrices are used. A more formal version of these diagrams are matrices of linear forms, which are used, for example, by Fiduccia [14] (the same proof appears in [29]), Brockett and Dobkin [5] and Lafon [20]. Makarov [22] gives a proof that uses ideas of Karatsuba’s algorithm for the efficient multiplication of polynomials. Büchi and Clausen [6] connect the existence of Strassen’s algorithm to the existence of special bases of the space of \(2 \times 2\) matrices in which the multiplication table has a specific structure (their results are more general and apply not only to matrix multiplication). Alexeyev [1] describes several algorithms for matrix multiplication as embeddings of the matrix algebra into a 7dimensional nonassociative algebra with a special properties.

coordinatefree we do not use explicit matrices, which allows us to focus on the algebraic properties required to prove the correctness of the algorithm. We avoid all tedious explicit calculations, in particular any expansions of expressions and any verification of explicit sign cancellations. Our proof can be seen as a coordinatefree version of Clausen’s construction.

elementary our proof uses only simple facts from basic linear algebra and does not require knowledge of representation theory. This is also why we do not use tensor language. Proofs from [10] and [18] are based on more complicated mathematics and may offer other insights.
Theorem 1
2 Preliminaries from linear algebra
If \(u_1, \ldots , u_n\) and \(v_1, \ldots , v_m\) form bases of the spaces of column vectors \(F^{n \times 1}\) and row vectors \(F^{1 \times m}\) respectively, then the nm products of the form \(u_i v_j\) form a basis of the space of matrices \(F^{n \times m}\)
The trace\(\mathop {{\text {tr}}}(A)\) of a square matrix A is the sum of its diagonal entries. If \(\mathop {{\text {tr}}}(A)\) is zero, then the matrix A is called traceless. Taking the trace of a product of (rectangular) matrices is invariant under cyclic shifts: \(\mathop {{\text {tr}}}(A_1 A_2 \cdots A_n) = \mathop {{\text {tr}}}(A_2 \cdots A_n A_1)\). As a consequence, the trace of a matrix is invariant under conjugations: \(\mathop {{\text {tr}}}(B^{1}AB) = \mathop {{\text {tr}}}(ABB^{1}) = \mathop {{\text {tr}}}(A)\). Another implication is that if u is a column vector and \(v^T\) is a row vector, then \(v^T u = \mathop {{\text {tr}}}(v^T u) = \mathop {{\text {tr}}}(u v^T)\).
The characteristic polynomial of a \(2 \times 2\) matrix A is \(\lambda ^2  \mathop {{\text {tr}}}(A)\lambda + \det (A)\). The Cayley—Hamilton theorem says that substituting A for \(\lambda \) yields the zero matrix.
3 Rotational symmetry
In this section we collect some standard facts about rotation matrices. We think of the \(2 \times 2\) matrix D as a rotation of the plane by \(120^\circ \), but to make our approach work over every field we use a more algebraic definition for D.
Let D have determinant 1 and trace \(1\), that is, D has characteristic polynomial \(\lambda ^2 + \lambda + 1\). We assume that D is not a multiple of the identity \(\mathop {{\text {id}}}\) (this is implicitly satisfied if the characteristic is not 3). For example, we could choose \(D = \begin{bmatrix} 0&1 \\ 1&1 \end{bmatrix}\), the matrix that cyclically permutes the three vectors \(\begin{pmatrix}1\\ 0\end{pmatrix}\), \(\begin{pmatrix}0\\ 1\end{pmatrix}\), \(\begin{pmatrix}1\\ 1\end{pmatrix}\).
Claim 2
For the matrix D we have \(D^3 = \mathop {{\text {id}}}\), \(D^{1} = D^2\), \(D^{2} = D\). Additionally, D has the following properties: \(\mathop {{\text {id}}}+ D + D^{1} = 0\) and \(\mathop {{\text {tr}}}(D^{1})=1\).
Proof
The characteristic polynomial of D is \(\lambda ^2 + \lambda + 1\). By the Cayley—Hamilton theorem \(D^2 + D + \mathop {{\text {id}}}= 0\). Multiplying by D we obtain \(D+D^2+D^3=0=\mathop {{\text {id}}}+D+D^2\) and hence \(D^3 = \mathop {{\text {id}}}\). Consequently, \(D^{1} = D^2\) and \(D^{2} = D\). Using \(D^{1}=D^2\) we get \(\mathop {{\text {id}}}+ D + D^{1} = 0\). This implies \(\mathop {{\text {tr}}}(D^{1}) =  \mathop {{\text {tr}}}(\mathop {{\text {id}}})  \mathop {{\text {tr}}}(D) = 1\). \(\square \)
For every column vector u define \(u^{\perp }\) as the row vector satisfying conditions \(u^{\perp } u = 0\) and \(u^{\perp } D u = 1\). If u is not an eigenvector of D, then u and Du are linearly independent, so \(u^{\perp }\) is uniquely defined. If, on the other hand, u is an eigenvector of D, the two conditions are inconsistent and \(u^{\perp }\) does not exist.
We fix a vector u that is not an eigenvector of D and define \(u^{\perp }\) as above. In our example we could choose \(u=\begin{pmatrix}1\\ 0\end{pmatrix}\), which is not an eigenvector of \(\begin{bmatrix} 0&1 \\ 1&1 \end{bmatrix}\).
A first simple observation relates \(u^\perp \) and \((Du)^{\perp }\):
Claim 3
\(u^\perp D^{1} = (D u)^\perp \).
Proof
We need to verify the two defining properties for \((D u)^\perp \). We have \((u^\perp D^{1})(D u) = u^\perp u = 0\) and \((u^\perp D^{1}) D (D u) = u^\perp D u = 1\) as required. \(\square \)
The following observation complements the fact that \(u^\perp D u =1\).
Claim 4
\(u^\perp D^{1} u=1\).
Proof
4 Seven multiplications suffice
In this section we apply structural properties from Sect. 3 to prove Theorem 1. We set \(M := u u^\perp \). Clearly \(\mathop {{\text {tr}}}(M) = u^\perp u = 0\) and we obtain the following identities that can be used to simplify products of M, D, and \(D^{1}\):
Claim 5
\(M^2 = 0\) and \(MDM = M\) and \(M D^{1} M = M\).
Proof
By Claim 2, conjugation with D is a map of order 3 on the vector space of all \(2 \times 2\) matrices, i.e. for any matrix A there is a triple of conjugates \(A \mapsto D^{1}AD \mapsto DAD^{1} \mapsto A\). Moreover, if A is traceless, then so are its conjugates.
Claim 6
The matrices M, \(D^{1}MD\), and \(DMD^{1}\) form a basis of the vector space of traceless matrices.
Proof
Since M is traceless, its conjugates are also traceless. Hence it is enough to prove that M, \(D^{1}MD\) and \(DMD^{1}\) are linearly independent.
Since D and \(D^{1}\) have trace \(1 \ne 0\) (Claim 2), adding D or \(D^{1}\) to the basis in Claim 6 yields two bases for the full space of \(2 \times 2\) matrices: \(\{ D, M, D^{1}MD, DMD^{1} \}\) and \(\{ D^{1}, M, D^{1}MD, DMD^{1} \}\).
Using the properties \(D^2 = D^{1}\), \(D^{2} = D\) and \(M^2 = 0\) from Claim 2 and Claim 5, we can write down the multiplication table with respect to these two bases. We further simplify it using the identities \(MDM = M\) and \(MD^{1}M = M\) from Claim 5.
Proof of Theorem 1
Remark
Taking the trace in (4.1) and using the fact that M and its conjugates are traceless, we see that \(\mathop {{\text {tr}}}(X)=x_1 \mathop {{\text {tr}}}(D) = x_1\), and \(\mathop {{\text {tr}}}(Y)=y_1\). Thus the first of the 7 summands is \(\mathop {{\text {tr}}}(X)\mathop {{\text {tr}}}(Y)\mathop {{\text {id}}}\).
Notes
Acknowledgements
Open access funding provided by Max Planck Society. The authors thank Alin Bostan, Joshua Grochow and anonymous referees for comments and pointers to the literature.
References
 1.Alekseyev, V.B.: Maximal extensions with simple multiplication for the algebra of matrices of the second order. Discrete Math. Appl. 7(1), 89–102 (1996). https://doi.org/10.1515/dma.1997.7.1.89 Google Scholar
 2.Ballard, G., Ikenmeyer, C., Landsberg, J.M., Ryder, N.: The geometry of rank decompositions of matrix multiplication II: \(3 \times 3\) matrices. J. Pure Appl. Algebra 223, 3205–3224 (2019)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Bläser, M.: Fast matrix multiplication. Theory Comput. Grad. Surv. 5, 1–60 (2013). https://doi.org/10.4086/toc.gs.2013.005 Google Scholar
 4.Brent, Richard P.: Algorithms for matrix multiplication. Technical Report STANCS70157, Stanford University, Department of Computer Science (1970). https://doi.org/10.21236/ad0705509
 5.Brockett, R.W., Dobkin, D.: On the optimal evaluation of a set of bilinear forms. In: Proceedings of the 5th ACM STOC, pp. 88–95 (1973). https://doi.org/10.1145/800125.804039
 6.Büchi, W., Clausen, M.: On a class of primary algebras of minimal rank. Linear Algebra Appl. 69, 249–268 (1985). https://doi.org/10.1016/00243795(85)900801 MathSciNetCrossRefzbMATHGoogle Scholar
 7.Bürgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory, volume 315 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin (1997). https://doi.org/10.1007/9783662033388 Google Scholar
 8.Burichenko, V.P.: On symmetries of the Strassen algorithm (2014). arXiv Preprint arXiv:1408.6273
 9.Chatelin, P.: On transformations of algorithms to multiply \(2 \times 2\) matrices. Inf. Process. Lett. 22(1), 1–5 (1986). https://doi.org/10.1016/00200190(86)900335 MathSciNetCrossRefzbMATHGoogle Scholar
 10.Chiantini, L., Ikenmeyer, C., Landsberg, J.M., Ottaviani, G.: The geometry of rank decompositions of matrix multiplication I: \(2 \times 2\) matrices. Exp Math (2017). https://doi.org/10.1080/10586458.2017.1403981 Google Scholar
 11.Clausen, M.: Beiträge zum Entwurf schneller Spektraltransformationen. Universität Karlsruhe, Habilitationsschrift (1988)Google Scholar
 12.de Groote, H.F.: On varieties of optimal algorithms for the computation of bilinear mappings II: Optimal algorithms for \(2 \times 2\)matrix multiplication. Theor. Comput. Sci. 7(2), 127–184 (1978). https://doi.org/10.1016/03043975(78)900452 MathSciNetCrossRefzbMATHGoogle Scholar
 13.Dumas, J.G., Pan, V.Y.: Fast matrix multiplication and symbolic computation (2016). arXiv Preprint arXiv:1612.05766
 14.Fiduccia, CM.: On obtaining upper bounds on the complexity of matrix multiplication. In: Complexity of Computer Computations, pp. 31–40 (1972). https://doi.org/10.1007/9781468420012_4
 15.Gastinel, N.: Sur le calcul des produits de matrices. Numer. Math. 17(3), 222–229 (1971). https://doi.org/10.1007/BF01436378 MathSciNetCrossRefzbMATHGoogle Scholar
 16.Gates, A.Q., Kreinovich, V.: Strassen’s algorithm made (somewhat) more natural: a pedagogical remark. Bull. EATCS 73, 142–145 (2001)MathSciNetzbMATHGoogle Scholar
 17.Grochow, J.A., Moore, C.: Matrix multiplication algorithms from group orbits (2016). arXiv Preprint arXiv:1612.01527
 18.Grochow, J.A., Moore, C.: Designing Strassen’s algorithm (2017). arXiv Preprint arXiv:1708.09398
 19.Huang, J., Rice, L., Matthews, D.A., van de Geijn, R.A.: Generating families of practical fast matrix multiplication algorithms. Proc. IPDPS 2017, 656–667 (2017). https://doi.org/10.1109/IPDPS.2017.56 Google Scholar
 20.Lafon, J.C.: Optimum computation of \(p\) bilinear forms. Linear Algebra Appl. 10(3), 225–240 (1975). https://doi.org/10.1016/00243795(75)900713 MathSciNetCrossRefzbMATHGoogle Scholar
 21.Le Gall, F.: Powers of tensors and fast matrix multiplication. Proc. ISSAC 2014, 296–303 (2014). https://doi.org/10.1145/2608628.2608664 MathSciNetzbMATHGoogle Scholar
 22.Makarov, O.M.: The connection between two multiplication algorithms. USSR Comput. Math. Math. Phys. 15(1), 218–223 (1975). https://doi.org/10.1016/00415553(75)901494 MathSciNetCrossRefzbMATHGoogle Scholar
 23.Pan, V.Y.: О схемах вычисления произведениЙ матриц и обратноЙ матрицы [On algorithms for matrix multiplication and inversion]. У с п. м а т. н а у к 27(5(167)), 249–250 (1972). Translation available in [24]. http://mi.mathnet.ru/umn5125
 24.Pan, V.Y.: Better late than never: filling a void in the history of fast matrix multiplication and tensor decompositions (2014). arXiv Preprint arXiv:1411.1972
 25.Pan, V.Y.: Fast matrix multiplication and its algebraic neighbourhood. Sb. Math. 208(11), 1661–1704 (2017). https://doi.org/10.1070/SM8833 MathSciNetCrossRefzbMATHGoogle Scholar
 26.Paterson, M.: Complexity of product and closure algorithms for matrices. In: Proceedings of the ICM 1974, vol. 2, pp. 483–489 (1974). https://www.mathunion.org/fileadmin/ICM/Proceedings/ICM1974.2/ICM1974.2.ocr.pdf#page=491. cited 03 Feb 2018
 27.Paterson, M.: Strassen symmetries. Presentation at Leslie Valiant’s 60th birthday celebration, 30.05.2009, Bethesda, Maryland, USA (2009). https://www.cis.upenn.edu/~mkearns/valiant/paterson.ppt. cited 03 Feb 2018
 28.Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969). https://doi.org/10.1007/BF02165411 MathSciNetCrossRefzbMATHGoogle Scholar
 29.Yuval, G.: A simple proof of Strassen’s result. Inf. Process. Lett. 7(6), 285–286 (1978). https://doi.org/10.1016/00200190(78)900182 MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.