A Family of High-Performance Matrix Multiplication Algorithms

Gunnels, John A.; Gustavson, Fred G.; Henry, Greg M.; van de Geijn, Robert A.

doi:10.1007/11558958_30

John A. Gunnels¹⁹,
Fred G. Gustavson¹⁹,
Greg M. Henry²⁰ &
…
Robert A. van de Geijn²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1462 Accesses
8 Citations

Abstract

We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6^L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2^L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gunnels, J.A., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2073, p. 51. Springer, Heidelberg (2001)
Chapter Google Scholar
ESSL Guide andReference for IBMES/3090Vector Multiprocessors.Order No. SA22-7220, IBM Corporation (February 1986)
Google Scholar
Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The Impact ofHierarchical Memory Systems on Linear Algebra Algorithm Design, CSRD Tech Report 625, University of Illinois at Urbana Champaign, pub.
Google Scholar
Agarwal, R.C., Gustavson, F., Zubair, M.: Exploiting functional parallelism on Power2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)
Article Google Scholar
Bilmes, J., Asanovic, K., Chin, C.-w., Demmel, J.: Optimizing Matrix Multiply using PHiPAC: a Portable, High-Performance, ANSI C Coding Methodology. In: Proc. of Int. Conf. on Supercomputing, Vienna, Austrian (July 1997)
Google Scholar
Clint Whaley, R., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. In: Proceedings of Supercomputing 1998 (1998)
Google Scholar
Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication, University of Texas at Austin, FLAME Working Note #9 (November 2002)
Google Scholar
Gustavson, F.G.: New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms. In: Boisvert, R.F., Tang, P.T.P. (eds.) The Architecture of Scientific Software, Kluwer Academic Press, Pub., Dordrecht (2001)
Google Scholar
Elmroth, E., Gustavson, F., Jonsson, I., Kagstrom, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)
Article MATH MathSciNet Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-Oblivious Algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society, pub., Los Alamitos (1999)
Google Scholar
Hong, J., Kung, H.: Complexity: The Red-Blue Pebble Game. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 326–333 (1981)
Google Scholar
Toledo, S.: A Survey of Out-of-Core Algorithms in Numerical Linear Algebra. In: Abello, J., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Disc. Math. & Theo. Comp. Sci., pp. 161–180. AMS Press, pub.
Google Scholar
Goto, K.: http://www.cs.utexas.edu/users/kgoto

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center,
John A. Gunnels & Fred G. Gustavson
Intel Corporation,
Greg M. Henry
The University of Texas, Austin
Robert A. van de Geijn

Authors

John A. Gunnels
View author publications
You can also search for this author in PubMed Google Scholar
Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
Greg M. Henry
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. van de Geijn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Kaj Madsen
Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A. (2006). A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_30

Download citation

DOI: https://doi.org/10.1007/11558958_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29067-4
Online ISBN: 978-3-540-33498-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics