Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

Veras, Richard; Franchetti, Franz

doi:10.1007/978-3-319-17353-5_20

Richard Veras¹⁶ &
Franz Franchetti¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

741 Accesses
1 Citations

Abstract

Matrix-Matrix Multiplication (MMM) is a fundamental operation in scientific computing. Achieving the floating point peak with this operation requires expert knowledge of linear algebra and computer architecture to craft a tuned implementation, for a given microarchitecture. To do this an expert follows a mechanical process for implementing MMM, by deriving an algorithm from models found in the literature. Then, the expert applies optimizations which are well suited for the target architecture. Lastly, the expert expresses that implementation in assembly code. In this paper, we argue that this process is mechanical and can be captured in a rule based program generation system such as Spiral. We then show that given this machinery, Spiral can produce code for large size MMM implementations that are competitive with hand tuned code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 12:1–12:25 (2008)
Article Google Scholar
Van Zee, F., van de Geijn, R.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. (2013)
Google Scholar
Spampinato, D., Püschel, M.: A Basic Linear Algebra Compiler. ACM CG 23 (2014)
Google Scholar
Qian, W., Xianyi, Z., Yunquan, Z., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: International Conference on High Performance Computing (2013)
Google Scholar
Franchetti, F., de Mesmay, F., McFarlin, D., Püschel, M.: Operator language: a program generation framework for fast kernels. In: Taha, W.M. (ed.) DSL 2009. LNCS, vol. 5658, pp. 385–409. Springer, Heidelberg (2009)
Chapter Google Scholar
Franchetti, F., Püschel, M.: Formal loop merging for signal transforms. In: PLDI, pp. 315–326 (2005)
Google Scholar
Püschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R., Rizzolo, N.: SPIRAL: code generation for DSP transforms. In: Proceedings of IEEE on “Program Generation, Optimization and Adaptation”, vol.93, pp. 232–275 (2005)
Google Scholar
Siek, J., Karlin, I., Jessup, E.: Build to order linear algebra kernels. In: Workshop on Performance Optimization of High-level Languages and Libraries (POHLL08) (2009)
Google Scholar
Marker, B.: Design by transformation: from domain knowledge to optimized program generation. Doctoral Dissertation,Department of Computer Science, The University of Texas at Austin (2014)
Google Scholar
Marker, B., Smith, T., Batory, D., Van Zee, F., Van de Geijn, R.: Code generation to aid parallel code development. Technical report TR-14-08, The University of Texas at Austin, Department of Computer Science (2014)
Google Scholar
Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: PLDI, pp. 318–328 (2008)
Google Scholar
Whaley. C.R., Dongarra, J.: Automatically tuned linear algebra software. In: SIAM Conference on Parallel Processing for Scientific Computing (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, USA
Richard Veras & Franz Franchetti

Authors

Richard Veras
View author publications
You can also search for this author in PubMed Google Scholar
Franz Franchetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Veras .

Editor information

Editors and Affiliations

IRIT, ENSEEIHT, Toulouse Cedex, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Osni Marques
Information Technology Center, The University of Tokyo, Tokyo, Japan
Kengo Nakajima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veras, R., Franchetti, F. (2015). Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-17353-5_20
Published: 18 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics