Cache Oblivious Sparse Matrix Multiplication
Abstract
We study the problem of sparse matrix multiplication in the Random Access Machine and in the Ideal Cache-Oblivious model. We present a simple algorithm that exploits randomization to compute the product of two sparse matrices with elements over an arbitrary field. Let \(A \in \mathbb {F}^{n \times n}\) and \(C \in \mathbb {F}^{n \times n}\) be matrices with h nonzero entries in total from a field \(\mathbb {F}\). In the RAM model, we are able to compute all the k nonzero entries of the product matrix \(AC \in \mathbb {F}^{n \times n}\) using \(\tilde{\mathcal {O}}(h + kn)\) time and \(\mathcal {O}(h)\) space, where the notation \(\tilde{\mathcal {O}}(\cdot )\) suppresses logarithmic factors. In the External Memory model, we are able to compute cache obliviously all the k nonzero entries of the product matrix \(AC \in \mathbb {F}^{n \times n}\) using \(\tilde{\mathcal {O}}(h/B + kn/B)\) I/Os and \(\mathcal {O}(h)\) space. In the Parallel External Memory model, we are able to compute all the k nonzero entries of the product matrix \(AC \in \mathbb {F}^{n \times n}\) using \(\tilde{\mathcal {O}}(h/PB + kn/PB)\) time and \(\mathcal {O}(h)\) space, which makes the analysis in the External Memory model a special case of Parallel External Memory for \(P=1\). The guarantees are given in terms of the size of the field and by bounding the size of \(\mathbb {F}\) as \({|}\mathbb {F}{|} > kn \log (n^2/k)\) we guarantee an error probability of at most \(1{\text{/ }}n\) for computing the matrix product.
References
- 1.Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
- 2.Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious Algorithms. In: 40th Annual Symposium on Foundations of Computer Science, pp. 285–297. IEEE (1999)Google Scholar
- 3.Arge, L., Goodrich, M.T., Nelson, M., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2008, pp. 197–206. ACM, New York (2008)Google Scholar
- 4.Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 228–237. ACM (2005)Google Scholar
- 5.Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)MathSciNetCrossRefMATHGoogle Scholar
- 6.Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, pp. 296–303. ACM (2014)Google Scholar
- 7.Yuster, R., Zwick, U.: Fast sparse matrix multiplication. ACM Trans. Algorithms (TALG) 1(1), 2–13 (2005)MathSciNetCrossRefMATHGoogle Scholar
- 8.Iwen, M.A., Spencer, C.V.: A note on compressed sensing and the complexity of matrix multiplication. Inf. Process. Lett. 109(10), 468–471 (2009)MathSciNetCrossRefMATHGoogle Scholar
- 9.Amossen, R.R., Pagh, R.: Faster join-projects and sparse matrix multiplications. In: Proceedings of the 12th International Conference on Database Theory, ICDT 2009, pp. 121–126. ACM, New York (2009)Google Scholar
- 10.Lingas, A.: A fast output-sensitive algorithm for Boolean matrix multiplication. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 408–419. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04128-0_37 CrossRefGoogle Scholar
- 11.Pagh, R.: Compressed matrix multiplication. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 442–451. ACM (2012)Google Scholar
- 12.Williams, R., Yu, H.: Finding orthogonal vectors in discrete structures. In: Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Philadelphia, PA, USA, pp. 1867–1877 (2014)Google Scholar
- 13.Jacob, R., Stöckel, M.: Fast output-sensitive matrix multiplication. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 766–778. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48350-3_64 CrossRefGoogle Scholar
- 14.Van Gucht, D., Williams, R., Woodruff, D.P., Zhang, Q.: The communication complexity of distributed set-joins with applications to matrix multiplication. In: Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, pp. 199–212. ACM, New York (2015)Google Scholar
- 15.Hong, J.W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, STOC 1981, pp. 326–333. ACM, New York (1981)Google Scholar
- 16.Pagh, R., Stöckel, M.: The input/output complexity of sparse matrix multiplication. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 750–761. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44777-2_62 Google Scholar
- 17.Chazelle, B., Guibas, L.J.: Fractional cascading: I. A data structuring technique. Algorithmica 1(1), 133–162 (1986)MathSciNetCrossRefMATHGoogle Scholar
- 18.Demaine, E.D., Gopal, V., Hasenplaugh, W.: Cache-oblivious iterated predecessor queries via range coalescing. In: Dehne, F., Sack, J.-R., Stege, U. (eds.) WADS 2015. LNCS, vol. 9214, pp. 249–262. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21840-3_21 CrossRefGoogle Scholar
- 19.Brodal, G.S., Fagerberg, R.: Cache oblivious distribution sweeping. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 426–438. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45465-9_37 CrossRefGoogle Scholar
- 20.Bender, M.A., Brodal, G.S., Fagerberg, R., Jacob, R., Vicari, E.: Optimal sparse matrix dense vector multiplication in the I/O-model. Theory Comput. Syst. 47(4), 934–962 (2010)MathSciNetCrossRefMATHGoogle Scholar