Skip to main content

A Family of High-Performance Matrix Multiplication Algorithms

  • Conference paper
Applied Parallel Computing. State of the Art in Scientific Computing (PARA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

Abstract

We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gunnels, J.A., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2073, p. 51. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  2. ESSL Guide andReference for IBMES/3090Vector Multiprocessors.Order No. SA22-7220, IBM Corporation (February 1986)

    Google Scholar 

  3. Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The Impact ofHierarchical Memory Systems on Linear Algebra Algorithm Design, CSRD Tech Report 625, University of Illinois at Urbana Champaign, pub.

    Google Scholar 

  4. Agarwal, R.C., Gustavson, F., Zubair, M.: Exploiting functional parallelism on Power2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)

    Article  Google Scholar 

  5. Bilmes, J., Asanovic, K., Chin, C.-w., Demmel, J.: Optimizing Matrix Multiply using PHiPAC: a Portable, High-Performance, ANSI C Coding Methodology. In: Proc. of Int. Conf. on Supercomputing, Vienna, Austrian (July 1997)

    Google Scholar 

  6. Clint Whaley, R., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. In: Proceedings of Supercomputing 1998 (1998)

    Google Scholar 

  7. Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication, University of Texas at Austin, FLAME Working Note #9 (November 2002)

    Google Scholar 

  8. Gustavson, F.G.: New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms. In: Boisvert, R.F., Tang, P.T.P. (eds.) The Architecture of Scientific Software, Kluwer Academic Press, Pub., Dordrecht (2001)

    Google Scholar 

  9. Elmroth, E., Gustavson, F., Jonsson, I., Kagstrom, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-Oblivious Algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society, pub., Los Alamitos (1999)

    Google Scholar 

  11. Hong, J., Kung, H.: Complexity: The Red-Blue Pebble Game. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 326–333 (1981)

    Google Scholar 

  12. Toledo, S.: A Survey of Out-of-Core Algorithms in Numerical Linear Algebra. In: Abello, J., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Disc. Math. & Theo. Comp. Sci., pp. 161–180. AMS Press, pub.

    Google Scholar 

  13. Goto, K.: http://www.cs.utexas.edu/users/kgoto

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A. (2006). A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_30

Download citation

  • DOI: https://doi.org/10.1007/11558958_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29067-4

  • Online ISBN: 978-3-540-33498-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics