On Aggressive Early Deflation in Parallel Variants of the QR Algorithm

  • Bo Kågström
  • Daniel Kressner
  • Meiyue Shao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7133)


The QR algorithm computes the Schur form of a matrix and is by far the most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift and aggressive early deflation (AED) techniques have led to significantly more efficient sequential implementations of the QR algorithm during the last decade. More recently, these techniques have been incorporated in a novel parallel QR algorithm on hybrid distributed memory HPC systems. While leading to significant performance improvements, it has turned out that AED may become a computational bottleneck as the number of processors increases. In this paper, we discuss a two-level approach for performing AED in a parallel environment, where the lower level consists of a novel combination of AED with the pipelined QR algorithm implemented in the ScaLAPACK routine PDLAHQR. Numerical experiments demonstrate that this new implementation further improves the performance of the parallel QR algorithm.


Execution Time Crossover Point Orthogonal Transformation Parallel Variant Matrix Eigenvalue Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adlerborn, B., Kågström, B., Kressner, D.: Parallel Variants of the Multishift QZ Algorithm with Advanced Deflation Techniques. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 117–126. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Anderson, E., Bai, Z., Bischof, C.H., Blackford, S., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.C.: LAPACK User’s Guide, 3rd edn. SIAM, Philadelphia (1999)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bai, Z., Demmel, J.W.: On a Block Implementation of Hessenberg Multishift QR Iteration. Intl. J. of High Speed Comput. 1, 97–112 (1989)CrossRefzbMATHGoogle Scholar
  4. 4.
    Bai, Z., Demmel, J.W.: On Swapping Diagonal Blocks in Real Schur Form. Linear Algebra Appl. 186, 73–95 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J.W., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)CrossRefzbMATHGoogle Scholar
  6. 6.
    Braman, K., Byers, R., Mathias, R.: The Multishift QR Algorithm. Part I: Maintaining Well-focused Shifts and Level 3 Performance. SIAM J. Matrix Anal. Appl. 23(4), 929–947 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Braman, K., Byers, R., Mathias, R.: The Multishift QR Algorithm. Part II: Aggressive Early Deflation. SIAM J. Matrix Anal. Appl. 23(4), 948–973 (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    Byers, R.: LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187 (2007)Google Scholar
  9. 9.
    Golub, G., Uhlig, F.: The QR Algorithm: 50 Years Later Its Genesis by John Francis and Vera Kublanovskaya and Subsequent Developments. IMA J. Numer. Anal. 29(3), 467–485 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Granat, R., Kågström, B., Kressner, D.: A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems. SIAM J. Sci. Comput. 32(4), 2345–2378 (2010) (An earlier version appeared as LAPACK Working Note 216)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Granat, R., Kågström, B., Kressner, D.: Parallel Eigenvalue Reordering in Real Schur Forms. Concurrency and Computat.: Pract. Exper. 21(9), 1225–1250 (2009)CrossRefGoogle Scholar
  12. 12.
    GOTO-BLAS – High-performance BLAS by Kazushige Goto,
  13. 13.
    Henry, G., van de Geijn, R.: Parallelizing the QR Algorithm for the Nonsymmetric Algebraic Eigenvalue Problem: Myths and Reality. SIAM J. Sci. Comput. 17, 870–883 (1997)CrossRefzbMATHGoogle Scholar
  14. 14.
    Henry, G., Watkins, D.S., Dongarra, J.J.: A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures. SIAM J. Sci. Comput. 24(1), 284–311 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kressner, D.: Numerical Methods for General and Structured Eigenvalue Problems. LNCSE, vol. 46. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Kressner, D.: The Effect of Aggressive Early Deflation on the Convergence of the QR Algorithm. SIAM J. Matrix Anal. Appl. 30(2), 805–821 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Lang, B.: Effiziente Orthogonaltransformationen bei der Eigen- und Singulärwertzerlegung. Habilitationsschrift (1997)Google Scholar
  18. 18.
    Watkins, D.S.: The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods. SIAM, Philadelphia (2007)CrossRefzbMATHGoogle Scholar
  19. 19.
    Watkins, D.S.: Francis’s Algorithm. Amer. Math. Monthly (2010) (to appear)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bo Kågström
    • 1
  • Daniel Kressner
    • 2
  • Meiyue Shao
    • 1
  1. 1.Department of Computing Science and HPC2NUmeå UniversityUmeåSweden
  2. 2.Seminar for Applied MathematicsETH ZürichSwitzerland

Personalised recommendations