Abstract
The optimization of dense linear algebra operations is a fundamental task in the solution of many scientific computing applications. The Roofline Model is a tool that provides an estimation of the performance that a computational kernel can attain on a hardware platform. Therefore, the RM can be used to investigate whether a computational kernel can be further accelerated. We present an approach, based on the RM, to optimize the algorithmic parameters of dense linear algebra kernels. In particular, we perform a basic analysis to identify the optimal values for the kernel parameters. As a proof-of-concept, we apply this technique to optimize a blocked algorithm for matrix inversion via Gauss-Jordan elimination. In addition, we extend this technique to multi-block computational kernels. An experimental evaluation validates the method and shows its convenience. We remark that the results obtained can be extended to other computational kernels similar to Gauss-Jordan elimination such as, e.g., matrix factorizations and the solution of linear least squares problems.
All researchers acknowledge the support from the EHFARS project funded by the German Ministry of Education and Research BMBF.
E.S. Quintana-Ortí—Supported by the CICYT project TIN2014-53495-R of the Ministerio de Economía y Competitividad and FEDER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Unleashing CPU-GPU acceleration for control theory applications. In: Caragiannis, I., et al. (eds.) Euro-Par 2012. LNCS, vol. 7640, pp. 102–111. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36949-0_13
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurrency Comput. Pract. Exp. 25(8), 1170–1182 (2013)
Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: formal linear algebra methods environment. ACM Trans. Math. Softw. 27(4), 422–455 (2001)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, London (2011)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Lo, Y.J., Williams, S., Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17248-4_7
Mehta, S., Garg, R., Trivedi, N., Yew, P.: TurboTiling: leveraging prefetching to boost performance of tiled codes. In: Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, New York, NY, USA, pp. 38:1–38:12. ACM (2016)
The ELAPS framework: http://hpac.rwth-aachen.de/~peise/elaps. High Performance and Automatic Computing group at RWTH-Aachen University
Quintana-Ortí, E.S., Quintana-Ortí, G., Sun, X., van de Geijn, R.A.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22, 1762–1771 (2001)
Talagala, N., Arpaci-Dusseau, R.H., Patterson, D.A.: Micro-benchmark based extraction of local and global disk characteristics. Citeseer (1999)
Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29(2), 209–232 (2015)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A., Silva, J.P. (2016). Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model. In: Carretero, J., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10049. Springer, Cham. https://doi.org/10.1007/978-3-319-49956-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-49956-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49955-0
Online ISBN: 978-3-319-49956-7
eBook Packages: Computer ScienceComputer Science (R0)