Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model

Benner, Peter; Ezzatti, Pablo; Quintana-Ortí, Enrique S.; Remón, Alfredo; Silva, Juan P.

doi:10.1007/978-3-319-49956-7_2

Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model

Peter Benner³²,
Pablo Ezzatti³⁰,
Enrique S. Quintana-Ortí³¹,
Alfredo Remón³² &
…
Juan P. Silva³⁰

Conference paper
First Online: 19 November 2016

917 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10049))

Abstract

The optimization of dense linear algebra operations is a fundamental task in the solution of many scientific computing applications. The Roofline Model is a tool that provides an estimation of the performance that a computational kernel can attain on a hardware platform. Therefore, the RM can be used to investigate whether a computational kernel can be further accelerated. We present an approach, based on the RM, to optimize the algorithmic parameters of dense linear algebra kernels. In particular, we perform a basic analysis to identify the optimal values for the kernel parameters. As a proof-of-concept, we apply this technique to optimize a blocked algorithm for matrix inversion via Gauss-Jordan elimination. In addition, we extend this technique to multi-block computational kernels. An experimental evaluation validates the method and shows its convenience. We remark that the results obtained can be extended to other computational kernels similar to Gauss-Jordan elimination such as, e.g., matrix factorizations and the solution of linear least squares problems.

All researchers acknowledge the support from the EHFARS project funded by the German Ministry of Education and Research BMBF.

E.S. Quintana-Ortí—Supported by the CICYT project TIN2014-53495-R of the Ministerio de Economía y Competitividad and FEDER.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)
MATH Google Scholar
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Unleashing CPU-GPU acceleration for control theory applications. In: Caragiannis, I., et al. (eds.) Euro-Par 2012. LNCS, vol. 7640, pp. 102–111. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36949-0_13
Chapter Google Scholar
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurrency Comput. Pract. Exp. 25(8), 1170–1182 (2013)
Article Google Scholar
Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)
Article MathSciNet MATH Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Article MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: formal linear algebra methods environment. ACM Trans. Math. Softw. 27(4), 422–455 (2001)
Article MATH Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, London (2011)
MATH Google Scholar
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Book MATH Google Scholar
Lo, Y.J., Williams, S., Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17248-4_7
Google Scholar
Mehta, S., Garg, R., Trivedi, N., Yew, P.: TurboTiling: leveraging prefetching to boost performance of tiled codes. In: Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, New York, NY, USA, pp. 38:1–38:12. ACM (2016)
Google Scholar
The ELAPS framework: http://hpac.rwth-aachen.de/~peise/elaps. High Performance and Automatic Computing group at RWTH-Aachen University
Quintana-Ortí, E.S., Quintana-Ortí, G., Sun, X., van de Geijn, R.A.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22, 1762–1771 (2001)
Article MathSciNet MATH Google Scholar
Talagala, N., Arpaci-Dusseau, R.H., Patterson, D.A.: Micro-benchmark based extraction of local and global disk characteristics. Citeseer (1999)
Google Scholar
Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29(2), 209–232 (2015)
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Computación, Universidad de la República, 11300, Montevideo, Uruguay
Pablo Ezzatti & Juan P. Silva
Dep. de Ingeniería y Ciencia de la Computación, Universidad Jaime I, 12701, Castellón, Spain
Enrique S. Quintana-Ortí
Max Planck Institute for Dynamics of Complex Technical Systems, 39106, Magdeburg, Germany
Peter Benner & Alfredo Remón

Authors

Peter Benner
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Ezzatti
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Remón
View author publications
You can also search for this author in PubMed Google Scholar
Juan P. Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Remón .

Editor information

Editors and Affiliations

Carlos III University of Madrid, Getafe, Spain
Jesus Carretero
Carlos III University of Madrid, Getafe, Spain
Javier Garcia-Blas
Mathematical Support for Computers, N. I. Lobachevsky State University of Nizhny Novgorod, Nizhniy Novgorod, Russia
Victor Gergel
Research Computing Center (RCC), Moscow State University, Moscow, Russia
Vladimir Voevodin
Research Computing Center (RCC), Moscow State University, Moscow, Russia
Iosif Meyerov
E.U. Politécnica, Universidad de Extremaddura, Cáceres, Spain
Juan A. Rico-Gallego
Ingenieria de Sistemas Informáticos, Universidad de Extremaddura, Cáceres, Spain
Juan C. Díaz-Martín
Universitat Politécnica de València, Valencia, Spain
Pedro Alonso
Distributed and Parallel Systems Group, Institute for Computer Science, Innsbruck, Austria
Juan Durillo
Carlos III University of Madrid, Getafe, Spain
José Daniel Garcia Sánchez
UCD School of Computer Science, University College Dublin, Dublin, Ireland
Alexey L. Lastovetsky
University of Calabria, Rende (CS), Italy
Fabrizio Marozzo
Information Science and Engineering, Central South University, Changsha, Hunan, China
Qin Liu
Information Science and Engineering, Central South University, Changsha, Hunan, China
Zakirul Alam Bhuiyan
Ludwig Maximilian University of Munich, Munich, Germany
Karl Fürlinger
Informatik 10 - Rechnertechnik, Technische Universität München, Munich, Germany
Josef Weidendorfer
High Performance Computing Center (HLRS), Stuttgart, Germany
José Gracia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A., Silva, J.P. (2016). Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model. In: Carretero, J., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10049. Springer, Cham. https://doi.org/10.1007/978-3-319-49956-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-49956-7_2
Published: 19 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49955-0
Online ISBN: 978-3-319-49956-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics