Auto-tuning the Matrix Powers Kernel with SEJITS

Morlan, Jeffrey; Kamil, Shoaib; Fox, Armando

doi:10.1007/978-3-642-38718-0_36

Jeffrey Morlan¹⁹,
Shoaib Kamil¹⁹ &
Armando Fox¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7851))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

2031 Accesses
1 Citations

Abstract

The matrix powers kernel, used in communication-avoiding Krylov subspace methods, requires runtime auto-tuning for best performance. We demonstrate how the SEJITS (Selective Embedded Just-In-Time Specialization) approach can be used to deliver a high-performance and performance-portable implementation of the matrix powers kernel to application authors, while separating their high-level concerns from those of auto-tuner implementers involving low-level optimizations. The benefits of delivering this kernel in the form of a specializer, rather than a traditional library, are discussed. Performance of the matrix powers kernel specializer is evaluated in the context of a communication-avoiding conjugate gradient (CA-CG) solver, which compares favorably to traditional CG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Supercomputing 2009, Portland, OR (November 2009)
Google Scholar
Catanzaro, B., Kamil, S., Lee, Y., Asanović, K., Demmel, J., Keutzer, K., Shalf, J., Yelick, K., Fox, A.: SEJITS: Getting productivity and performance with selective embedded JIT specialization. In: Workshop on Programming Models for Emerging Architectures, PMEA 2009, Raleigh, NC (October 2009)
Google Scholar
Kamil, S.: Asp: A SEJITS implementation for Python, https://github.com/shoaibkamil/asp/wiki
Kamil, S., Coetzee, D., Fox, A.: Bringing parallel performance to Python with domain-specific selective embedded just-in-time specialization. In: Proceedings of the 10th Python in Science Conference, SciPy 2011, Austin, TX (2011)
Google Scholar
Intel: Math Kernel Library, http://software.intel.com/en-us/articles/intel-mkl/
Carson, E., Demmel, J., Knight, N.: Hypergraph partitioning for computing matrix powers (October 2010), http://www.cs.berkeley.edu/~knight/cdk_CSC11_abstract.pdf
Catalyürek, Ü.V.: Partitioning Tools for Hypergraph, http://bmi.osu.edu/~umit/software.html
Hoemmen, M.: Communication-avoiding Krylov subspace methods. PhD thesis, EECS Department, University of California, Berkeley (April 2010)
Google Scholar
Davis, T., Hu, Y.: The University of Florida sparse matrix collection, http://www.cise.ufl.edu/research/sparse/matrices
Bilmes, J., Asanović, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In: Proceedings of International Conference on Supercomputing, Vienna, Austria (July 1997)
Google Scholar
Vuduc, R., Demmel, J.W., Yelick, K.A.: OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics Conference Series 16(i), 521–530 (2005)
Article Google Scholar
Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, Special Issue on “Program Generation, Optimization, and Adaptation” 93(2), 232–275 (2005)
Google Scholar
Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: IPDPS 2010, pp. 1–12 (2010)
Google Scholar
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The Pochoir stencil compiler. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011)
Chapter Google Scholar
Katagiri, T., Kise, K., Honda, H., Yuba, T.: ABCLibScript: a directive to support specification of an auto-tuning facility for numerical software. Parallel Comput. 32(1), 92–112 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division, University of California at Berkeley, Berkeley, CA, 94720, USA
Jeffrey Morlan, Shoaib Kamil & Armando Fox

Authors

Jeffrey Morlan
View author publications
You can also search for this author in PubMed Google Scholar
Shoaib Kamil
View author publications
You can also search for this author in PubMed Google Scholar
Armando Fox
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INPT (ENSEEIHT) - IRIT, University of Toulouse, 31062, Toulouse, France
Michel Daydé
Lawrence Berkeley National Laboratory, 94720-8139, Berkeley, CA, USA
Osni Marques
Information Technology Center, The University of Tokyo, 113-8658, Tokyo, Japan
Kengo Nakajima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morlan, J., Kamil, S., Fox, A. (2013). Auto-tuning the Matrix Powers Kernel with SEJITS. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science - VECPAR 2012. VECPAR 2012. Lecture Notes in Computer Science, vol 7851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38718-0_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-38718-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38717-3
Online ISBN: 978-3-642-38718-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics