Skip to main content

Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10049))

Abstract

The optimization of dense linear algebra operations is a fundamental task in the solution of many scientific computing applications. The Roofline Model is a tool that provides an estimation of the performance that a computational kernel can attain on a hardware platform. Therefore, the RM can be used to investigate whether a computational kernel can be further accelerated. We present an approach, based on the RM, to optimize the algorithmic parameters of dense linear algebra kernels. In particular, we perform a basic analysis to identify the optimal values for the kernel parameters. As a proof-of-concept, we apply this technique to optimize a blocked algorithm for matrix inversion via Gauss-Jordan elimination. In addition, we extend this technique to multi-block computational kernels. An experimental evaluation validates the method and shows its convenience. We remark that the results obtained can be extended to other computational kernels similar to Gauss-Jordan elimination such as, e.g., matrix factorizations and the solution of linear least squares problems.

All researchers acknowledge the support from the EHFARS project funded by the German Ministry of Education and Research BMBF.

E.S. Quintana-Ortí—Supported by the CICYT project TIN2014-53495-R of the Ministerio de Economía y Competitividad and FEDER.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)

    MATH  Google Scholar 

  2. Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Unleashing CPU-GPU acceleration for control theory applications. In: Caragiannis, I., et al. (eds.) Euro-Par 2012. LNCS, vol. 7640, pp. 102–111. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36949-0_13

    Chapter  Google Scholar 

  3. Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurrency Comput. Pract. Exp. 25(8), 1170–1182 (2013)

    Article  Google Scholar 

  4. Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)

    Article  MATH  Google Scholar 

  6. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  7. Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: formal linear algebra methods environment. ACM Trans. Math. Softw. 27(4), 422–455 (2001)

    Article  MATH  Google Scholar 

  8. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, London (2011)

    MATH  Google Scholar 

  9. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)

    Book  MATH  Google Scholar 

  10. Lo, Y.J., Williams, S., Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17248-4_7

    Google Scholar 

  11. Mehta, S., Garg, R., Trivedi, N., Yew, P.: TurboTiling: leveraging prefetching to boost performance of tiled codes. In: Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, New York, NY, USA, pp. 38:1–38:12. ACM (2016)

    Google Scholar 

  12. The ELAPS framework: http://hpac.rwth-aachen.de/~peise/elaps. High Performance and Automatic Computing group at RWTH-Aachen University

  13. Quintana-Ortí, E.S., Quintana-Ortí, G., Sun, X., van de Geijn, R.A.: A note on parallel matrix inversion. SIAM J. Sci. Comput. 22, 1762–1771 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Talagala, N., Arpaci-Dusseau, R.H., Patterson, D.A.: Micro-benchmark based extraction of local and global disk characteristics. Citeseer (1999)

    Google Scholar 

  15. Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29(2), 209–232 (2015)

    Article  Google Scholar 

  16. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Remón .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A., Silva, J.P. (2016). Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model. In: Carretero, J., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10049. Springer, Cham. https://doi.org/10.1007/978-3-319-49956-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49956-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49955-0

  • Online ISBN: 978-3-319-49956-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics