Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

Norris, Boyana; Hartono, Albert; Jessup, Elizabeth; Siek, Jeremy

doi:10.1007/978-3-642-01970-8_25

Boyana Norris⁷,
Albert Hartono⁸,
Elizabeth Jessup⁹ &
…
Jeremy Siek⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5544))

2023 Accesses
4 Citations

Abstract

The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we describe the first steps toward a fully automated system for the optimization of the matrix algebra kernels that are a foundational part of many scientific applications. To generate highly optimized code from a high-level MATLAB prototype, we define a three-step approach. To begin, we have developed a compiler that converts a MATLAB script into simple C code. We then use the polyhedral optimization system Pluto to optimize that code for coarse-grained parallelism and locality simultaneously. Finally, we annotate the resulting code with performance-tuning directives and use the empirical performance-tuning system Orio to generate many tuned versions of the same operation using different optimization techniques, such as loop unrolling and memory alignment. Orio performs an automated empirical search to select the best among the multiple optimized code variants. We discuss performance results on two architectures.

Download to read the full chapter text

Chapter PDF

Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Article 01 October 2016

Keywords

References

Dongarra, J.J., Croz, J.D., Duff, I.S., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16, 1–17 (1990)
Article MathSciNet Google Scholar
MathWorks: MATLAB - The Language of Technical Computing, http://www.mathworks.com/products/matlab/
Menon, V., Pingali, K.: High-level semantic optimization of numerical codes. In: Proceedings of the 13th International Conference on Supercomputing, pp. 434–443. ACM Press, New York (1999)
Chapter Google Scholar
Kennedy, K., et al.: Telescoping languages project description (2006), http://telescoping.rice.edu
Goedecker, S., Hoisie, A.: Performance optimization of numerically intensive codes. Software Environments & Tools 12 (2001)
Google Scholar
Dongarra, J.J., Croz, J.D., Duff, I., Hammarling, S.: A set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Article MathSciNet Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide, 2nd edn. SIAM, Philadelphia (1995)
MATH Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)
Article Google Scholar
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In: International Conference on Supercomputing, pp. 340–347 (1997)
Google Scholar
Vuduc, R., Demmel, J., Yelick, K.: OSKI: A library of automatically tuned sparse matrix kernels. In: Proceedings of SciDAC 2005. Journal of Physics: Conference Series, vol. 16, pp. 521–530. Institute of Physics Publishing (June 2005)
Google Scholar
Fowler, R., Jin, G., Mellor-Crummey, J.: Increasing temporal locality with skewing and recursive blocking. In: Proceedings of SC 2001: High-Performance Computing and Networking (November 2001)
Google Scholar
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: Parameterized optimizations for empirical tuning. In: Proceedings of the Parallel and Distributed Processing Symposium, 2007, pp. 1–8. IEEE, Los Alamitos (2007)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: Pluto: A practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 2008), Tucson, AZ (June 2008)
Google Scholar
Saad, Y.: SPARSKIT: A basic tool kit for sparse matrix computations. University of Minnesota, Department of Computer Science and Engineering (1990)
Google Scholar
Goto, K., van de Geijn, R.: High-performance implementation of the level-3 BLAS. Technical Report TR-2006-23, The University of Texas at Austin, Department of Computer Sciences (2006)
Google Scholar
Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: High-performance parallel implicit CFD. Parallel Computing 27, 337–362 (2001)
Article Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2003)
Google Scholar
Jessup, E., Karlin, I., Siek, J.: Build to order linear algebra kernels. In: Proceedings of the IEEE International Symposium on Parallel and Distributed (IPDPS), pp. 1–8. IEEE, Los Alamitos (2008)
Google Scholar
Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications. Computational Science, pp. 443–462. Chapman & Hall / CRC Press, Taylor and Francis Group (2007)
Google Scholar
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, Rome, Italy, IEEE, Los Alamitos (2009)
Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Inc., Englewood Cliffs (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Argonne National Laboratory, USA
Boyana Norris
Ohio State University, USA
Albert Hartono
University of Colorado at Boulder, USA
Elizabeth Jessup & Jeremy Siek

Authors

Boyana Norris
View author publications
You can also search for this author in PubMed Google Scholar
Albert Hartono
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Jessup
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Siek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computation & Technology, Louisiana State University, 216 Johnston Hall, LA 70803, Baton Rouge, USA
Gabrielle Allen
Poznan Supercomputing and Networking Center, Poznan, Poland
Jaroslaw Nabrzyski
Center for Computation and Technology, Louisiana State University, LA 70803, Baton Rouge, USA
Edward Seidel
Department of Mathematics and Computer Science, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Geert Dick van Albada
Computer Science Department, Knoxville, University of Tennessee, TN 37996-3450, USA
Jack Dongarra
Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Norris, B., Hartono, A., Jessup, E., Siek, J. (2009). Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2009. Lecture Notes in Computer Science, vol 5544. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01970-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-01970-8_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01969-2
Online ISBN: 978-3-642-01970-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

Abstract

Chapter PDF

Similar content being viewed by others

Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes

Abstract

Chapter PDF

Similar content being viewed by others

Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation