Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm

Yamamoto, Yusaku

doi:10.1007/11946441_44

Yusaku Yamamoto²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4330))

Included in the following conference series:

International Symposium on Parallel and Distributed Processing and Applications

607 Accesses
1 Citations

Abstract

The small-bulge multishift QR algorithm proposed by Braman, Byers and Mathias is one of the most efficient algorithms for computing the eigenvalues of nonsymmetric matrices on processors with hierarchical memory. However, to fully extract its potential performance, it is crucial to choose the block size m properly according to the target architecture and the matrix size n. In this paper, we construct a performance model for this algorithm. The model has a hierarchical structure that reflects the structure of the original algorithm and given n, m and the performance data of the basic components of the algorithm, such as the level-3 BLAS routines and the double implicit shift QR routine, predicts the total execution time. Experiments on SMP machines with PowerPC G5 and Opteron processors show that the variation of the execution time as a function of m predicted by the model agrees well with the measurements. Thus our model can be used to automatically select the optimal value of m for a given matrix size on a given architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Francis, J.G.F.: The QR transformation. a unitary analogue to the LR transformation. I. Comput. J. 4, 265–271 (1961)
Article MathSciNet Google Scholar
Francis, J.G.F.: The QR transformation. II. Comput. J. 4, 332–345 (1961)
Article MathSciNet Google Scholar
Kublanovskaya, V.N.: On some algorithms for the solution of the complete eigenvalue problem. U.S.S.R. Comput. Math. and Math. Phys. 3, 637–657 (1961)
Google Scholar
Bai, Z., Demmel, J.: On a block implementation of Hessenberg QR iteration. Int. J. of High Speed Computing 1, 97–112 (1989)
Article MATH Google Scholar
Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. part I: Maintaining well-focused shifts and level 3 performance. SIAM Journal on Matrix Analysis and Applications 23, 929–947 (2002)
Article MATH MathSciNet Google Scholar
Dubrulle, A.: The multishift QR algorithm: Is it worth the trouble? Palo Alto Scientific Center Report G320-3558x, IBM Corp. (1991)
Google Scholar
Henry, G., Watkins, D.S., Dongarra, J.: A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures. SIAM J. Sci. Comput. 24, 284–311 (2002)
Article MATH MathSciNet Google Scholar
Watkins, D.S.: Bidirectional chasing algorithms for the eigenvalue problem. SIAM J. Matrix Anal. Appl. 14, 166–179 (1993)
Article MATH MathSciNet Google Scholar
Watkins, D.S.: Shifting strategies for the parallel QR algorithm. SIAM J. Sci. Comput. 15, 953–958 (1994)
Article MATH MathSciNet Google Scholar
Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)
MATH Google Scholar
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Watkins, D.S.: The transmission of shifts and shift blurring in the QR algorithm. Linear Algebra and Its Applications 241/243, 877–896 (1996)
Article MathSciNet Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK User’s Guide. SIAM, Philadelphia (1992)
Google Scholar
Dackland, K., Kågström, B.: A hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine. In: Madsen, K., Olesen, D., Waśniewski, J., Dongarra, J. (eds.) PARA 1996. LNCS, vol. 1184, pp. 187–195. Springer, Heidelberg (1996)
Google Scholar
Cuenca, J., Gimenez, D., Gonzalez, J.: Architecture of an automatically tuned linear algebra library. Parallel Computing 30, 187–210 (2004)
Article Google Scholar
Cuenca, J., Garcia, L.P., Gimenez, D.G.: Empirical modelling of parallel linear algebra routines. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 169–174. Springer, Heidelberg (2004)
Chapter Google Scholar
Katagiri, T., Kuroda, H., Kanada, Y.: A methodology for automatically tuned parallel tri-diagonalization on distributed memory parallel machines. In: Proceedings of VecPar2000, Faculdade de Engenharia da Universidade do Porto, Portugal, pp. 265–277 (2000)
Google Scholar
Yamamoto, Y.: Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm. In: Proceedings of HPC-Asia 2005, Beijing, pp. 249–256 (2005)
Google Scholar
Dongarra, J., Eijkhout, V.: Self-adapting numerical software for next generation applications. International Journal of High Performance Computing Applications 17, 125–131 (2003)
Article Google Scholar
Whaley, R., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001)
Article MATH Google Scholar
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PhiPAC: a portable, high-performance, ANSI-C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, Vienna, pp. 340–347 (1997)
Google Scholar
Kressner, D.: Numerical Methods for General and Structured Eigenvalue Problems. Springer, Heidelberg (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Nagoya University, Nagoya, Aichi, 464-8603, Japan
Yusaku Yamamoto

Authors

Yusaku Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Minyi Guo
Department of Computer Science, St. Francis Xavier University, Antigonish, Canada
Laurence T. Yang
Dipartimento di Ingegneria dell’ Informazione - Second, University of Naples - Italy, Real Casa dell’Annunziata, via Roma, 29 81031, Aversa (CE), Italy
Beniamino Di Martino
Institute of Scientific Computing, University of Vienna, Nordbergstr. 15/C/3, A-1090, Vienna, Austria/JPL, Caltech, USA
Hans P. Zima
Computer Science Department, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra
Grid Computing Center, Shanghai Jiao Tong University, 800 Dongchuan Road, 200240, Shanghai, China
Feilong Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamamoto, Y. (2006). Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm. In: Guo, M., Yang, L.T., Di Martino, B., Zima, H.P., Dongarra, J., Tang, F. (eds) Parallel and Distributed Processing and Applications. ISPA 2006. Lecture Notes in Computer Science, vol 4330. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946441_44

Download citation

DOI: https://doi.org/10.1007/11946441_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68067-3
Online ISBN: 978-3-540-68070-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics