Abstract
Matrix-Matrix Multiplication (MMM) is a fundamental operation in scientific computing. Achieving the floating point peak with this operation requires expert knowledge of linear algebra and computer architecture to craft a tuned implementation, for a given microarchitecture. To do this an expert follows a mechanical process for implementing MMM, by deriving an algorithm from models found in the literature. Then, the expert applies optimizations which are well suited for the target architecture. Lastly, the expert expresses that implementation in assembly code. In this paper, we argue that this process is mechanical and can be captured in a rule based program generation system such as Spiral. We then show that given this machinery, Spiral can produce code for large size MMM implementations that are competitive with hand tuned code.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 12:1–12:25 (2008)
Van Zee, F., van de Geijn, R.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. (2013)
Spampinato, D., Püschel, M.: A Basic Linear Algebra Compiler. ACM CG 23 (2014)
Qian, W., Xianyi, Z., Yunquan, Z., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: International Conference on High Performance Computing (2013)
Franchetti, F., de Mesmay, F., McFarlin, D., Püschel, M.: Operator language: a program generation framework for fast kernels. In: Taha, W.M. (ed.) DSL 2009. LNCS, vol. 5658, pp. 385–409. Springer, Heidelberg (2009)
Franchetti, F., Püschel, M.: Formal loop merging for signal transforms. In: PLDI, pp. 315–326 (2005)
Püschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R., Rizzolo, N.: SPIRAL: code generation for DSP transforms. In: Proceedings of IEEE on “Program Generation, Optimization and Adaptation”, vol.93, pp. 232–275 (2005)
Siek, J., Karlin, I., Jessup, E.: Build to order linear algebra kernels. In: Workshop on Performance Optimization of High-level Languages and Libraries (POHLL08) (2009)
Marker, B.: Design by transformation: from domain knowledge to optimized program generation. Doctoral Dissertation,Department of Computer Science, The University of Texas at Austin (2014)
Marker, B., Smith, T., Batory, D., Van Zee, F., Van de Geijn, R.: Code generation to aid parallel code development. Technical report TR-14-08, The University of Texas at Austin, Department of Computer Science (2014)
Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: PLDI, pp. 318–328 (2008)
Whaley. C.R., Dongarra, J.: Automatically tuned linear algebra software. In: SIAM Conference on Parallel Processing for Scientific Computing (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Veras, R., Franchetti, F. (2015). Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-17353-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)