A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Agullo, Emmanuel; Dongarra, Jack; Nath, Rajib; Tomov, Stanimire

doi:10.1007/978-3-642-23397-5_19

Emmanuel Agullo¹⁸,
Jack Dongarra¹⁹,
Rajib Nath¹⁹ &
…
Stanimire Tomov¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6853))

Included in the following conference series:

European Conference on Parallel Processing

1407 Accesses
4 Citations

Abstract

Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach.We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.

Download to read the full chapter text

Chapter PDF

Verification, Validation and Uncertainty Quantification of Large-Scale Applications with QCG-PilotJob

Catwalk: A Quick Development Path for Performance Models

Early Application Performance at the Hartree Centre with the OpenPOWER Architecture

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Frigo, M., Johnson, S.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE, Los Alamitos (1998)
Google Scholar
Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India (January 2010)
Google Scholar
Ansel, J., Chan, C., Wong, Y.L., Olszewski, M., Zhao, Q., Edelman, A., Amarasinghe, S.: Petabricks: A language and compiler for algorithmic choice. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Dublin, Ireland (June 2009)
Google Scholar
Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Computing 27(1-2), 3–35 (2001)
Article MATH Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking gpus to tune dense linear algebra. In: SC 2008: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–11. IEEE Press, Piscataway (2008)
Google Scholar
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with gpu accelerators. Accepted for publication at HIPS 2010 (2010)
Google Scholar
Quintana-Ortí, G., Quintana-Ortí, E., van de Geijn, R., Van Zee, F., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3) (2009)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Agullo, E., Hadri, B., Ltaief, H., Dongarra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: 2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2009) (2009)
Google Scholar
Agullo, E., Dongarra, J., Nath, R., Tomov, S.: A Fully Empirical Autotuned Dense QR Factorization For Multicore Architectures. Research Report 7526, INRIA (Febuary 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

LaBRI and INRIA Bordeaux Sud Ouest, France
Emmanuel Agullo
University of Tennessee, USA
Jack Dongarra, Rajib Nath & Stanimire Tomov

Authors

Emmanuel Agullo
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar
Rajib Nath
View author publications
You can also search for this author in PubMed Google Scholar
Stanimire Tomov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Emmanuel Jeannot & Raymond Namyst &
Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Jean Roman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agullo, E., Dongarra, J., Nath, R., Tomov, S. (2011). A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures. In: Jeannot, E., Namyst, R., Roman, J. (eds) Euro-Par 2011 Parallel Processing. Euro-Par 2011. Lecture Notes in Computer Science, vol 6853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23397-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-23397-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23396-8
Online ISBN: 978-3-642-23397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Verification, Validation and Uncertainty Quantification of Large-Scale Applications with QCG-PilotJob

Catwalk: A Quick Development Path for Performance Models

Early Application Performance at the Hartree Centre with the OpenPOWER Architecture

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Verification, Validation and Uncertainty Quantification of Large-Scale Applications with QCG-PilotJob

Catwalk: A Quick Development Path for Performance Models

Early Application Performance at the Hartree Centre with the OpenPOWER Architecture

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation