Skip to main content

High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1947))

Abstract

We present a high performance Cholesky factorization algorithm, called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and a recursive packed data format. A full analysis of overcoming the non-linear addressing overhead imposed by recursion is given and discussed. Finally, since BPC uses GEMM to a great extent, we easily get a considerable amount of SMP parallelism from an SMP GEMM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Andersen, F. Gustavson, and J. Waśniewski. A recursive formulation of Cholesky factorization of a matrix in packed storage. Technical Report CS-00-441, University of Tennessee, Knoxville, TN, Computer Science Dept., May 2000. Also LAPACK Working Note number 146 (lawn146.ps), and submitted to the ACM Transaction of Mathematical Software.

    Google Scholar 

  2. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK User’s Guide. SIAM, Philadelphia, third edition, 1999.

    Google Scholar 

  3. Gene H. Golub and Charles Van Loan. Matrix Computations. Johns Hopkins, third edition, 1996.

    Google Scholar 

  4. F.G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development, 41(6), November 1997.

    Google Scholar 

  5. F.G. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Recursive Blocked Data Formats and BLAS’s for Dense Linear Algebra Algorithms. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel Computing, PARA’98, volume 1541 of Lecture Notes in Computer Science, pages 195–206. Springer-Verlag, 1998.

    Google Scholar 

  6. F.G. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Superscalar GEMM-based Level 3 BLAS-The On-going Evolution of a Portable and High-Performance Library. In B. Kågström, J. Dongarra, E. Elmroth, and J. Waśniewski, editors, Applied Parallel Computing, PARA’98, volume 1541 of Lecture Notes in Computer Science, pages 207–215. Springer-Verlag,1998.

    Google Scholar 

  7. Fred Gustavson and Isak Jonsson. Minimal Storage High Performance Cholesky Factorization via Blocking and Recursion. Submitted to IBM Journal of Research and Development in June 2000.

    Google Scholar 

  8. IBM Corporation. Engineering and Scientific Subroutine Library for AIX, Guide and Reference, third edition, October 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gustavson, F., Jonsson, I. (2001). High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage. In: Sørevik, T., Manne, F., Gebremedhin, A.H., Moe, R. (eds) Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. PARA 2000. Lecture Notes in Computer Science, vol 1947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70734-4_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-70734-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41729-3

  • Online ISBN: 978-3-540-70734-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics