Skip to main content

Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm

  • Conference paper
Parallel and Distributed Processing and Applications (ISPA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4330))

Abstract

The small-bulge multishift QR algorithm proposed by Braman, Byers and Mathias is one of the most efficient algorithms for computing the eigenvalues of nonsymmetric matrices on processors with hierarchical memory. However, to fully extract its potential performance, it is crucial to choose the block size m properly according to the target architecture and the matrix size n. In this paper, we construct a performance model for this algorithm. The model has a hierarchical structure that reflects the structure of the original algorithm and given n, m and the performance data of the basic components of the algorithm, such as the level-3 BLAS routines and the double implicit shift QR routine, predicts the total execution time. Experiments on SMP machines with PowerPC G5 and Opteron processors show that the variation of the execution time as a function of m predicted by the model agrees well with the measurements. Thus our model can be used to automatically select the optimal value of m for a given matrix size on a given architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Francis, J.G.F.: The QR transformation. a unitary analogue to the LR transformation. I. Comput. J. 4, 265–271 (1961)

    Article  MathSciNet  Google Scholar 

  2. Francis, J.G.F.: The QR transformation. II. Comput. J. 4, 332–345 (1961)

    Article  MathSciNet  Google Scholar 

  3. Kublanovskaya, V.N.: On some algorithms for the solution of the complete eigenvalue problem. U.S.S.R. Comput. Math. and Math. Phys. 3, 637–657 (1961)

    Google Scholar 

  4. Bai, Z., Demmel, J.: On a block implementation of Hessenberg QR iteration. Int. J. of High Speed Computing 1, 97–112 (1989)

    Article  MATH  Google Scholar 

  5. Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. part I: Maintaining well-focused shifts and level 3 performance. SIAM Journal on Matrix Analysis and Applications 23, 929–947 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  6. Dubrulle, A.: The multishift QR algorithm: Is it worth the trouble? Palo Alto Scientific Center Report G320-3558x, IBM Corp. (1991)

    Google Scholar 

  7. Henry, G., Watkins, D.S., Dongarra, J.: A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures. SIAM J. Sci. Comput. 24, 284–311 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. Watkins, D.S.: Bidirectional chasing algorithms for the eigenvalue problem. SIAM J. Matrix Anal. Appl. 14, 166–179 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  9. Watkins, D.S.: Shifting strategies for the parallel QR algorithm. SIAM J. Sci. Comput. 15, 953–958 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  10. Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)

    MATH  Google Scholar 

  11. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  12. Watkins, D.S.: The transmission of shifts and shift blurring in the QR algorithm. Linear Algebra and Its Applications 241/243, 877–896 (1996)

    Article  MathSciNet  Google Scholar 

  13. Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK User’s Guide. SIAM, Philadelphia (1992)

    Google Scholar 

  14. Dackland, K., Kågström, B.: A hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine. In: Madsen, K., Olesen, D., Waśniewski, J., Dongarra, J. (eds.) PARA 1996. LNCS, vol. 1184, pp. 187–195. Springer, Heidelberg (1996)

    Google Scholar 

  15. Cuenca, J., Gimenez, D., Gonzalez, J.: Architecture of an automatically tuned linear algebra library. Parallel Computing 30, 187–210 (2004)

    Article  Google Scholar 

  16. Cuenca, J., Garcia, L.P., Gimenez, D.G.: Empirical modelling of parallel linear algebra routines. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 169–174. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Katagiri, T., Kuroda, H., Kanada, Y.: A methodology for automatically tuned parallel tri-diagonalization on distributed memory parallel machines. In: Proceedings of VecPar2000, Faculdade de Engenharia da Universidade do Porto, Portugal, pp. 265–277 (2000)

    Google Scholar 

  18. Yamamoto, Y.: Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm. In: Proceedings of HPC-Asia 2005, Beijing, pp. 249–256 (2005)

    Google Scholar 

  19. Dongarra, J., Eijkhout, V.: Self-adapting numerical software for next generation applications. International Journal of High Performance Computing Applications 17, 125–131 (2003)

    Article  Google Scholar 

  20. Whaley, R., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001)

    Article  MATH  Google Scholar 

  21. Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PhiPAC: a portable, high-performance, ANSI-C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, Vienna, pp. 340–347 (1997)

    Google Scholar 

  22. Kressner, D.: Numerical Methods for General and Structured Eigenvalue Problems. Springer, Heidelberg (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yamamoto, Y. (2006). Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm. In: Guo, M., Yang, L.T., Di Martino, B., Zima, H.P., Dongarra, J., Tang, F. (eds) Parallel and Distributed Processing and Applications. ISPA 2006. Lecture Notes in Computer Science, vol 4330. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946441_44

Download citation

  • DOI: https://doi.org/10.1007/11946441_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68067-3

  • Online ISBN: 978-3-540-68070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics