Skip to main content

Prospectus for the Next LAPACK and ScaLAPACK Libraries

  • Conference paper
Book cover Applied Parallel Computing. State of the Art in Scientific Computing (PARA 2006)

Abstract

New releases of the widely used LAPACK and ScaLAPACK numerical linear algebra libraries are planned. Based on an on-going user survey (www.netlib.org/lapack-dev) and research by many people, we are proposing the following improvements: Faster algorithms, including better numerical methods, memory hierarchy optimizations, parallelism, and automatic performance tuning to accommodate new architectures; More accurate algorithms, including better numerical methods, and use of extra precision; Expanded functionality, including updating and downdating, new eigenproblems, etc. and putting more of LAPACK into ScaLAPACK; Improved ease of use, e.g., via friendlier interfaces in multiple languages. To accomplish these goals we are also relying on better software engineering techniques and contributions from collaborators at many institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steele, A., et al.: The Fortress language specification, version 0.707, research.sun.com/projects/plrg/fortress0707.pdf

  2. Andersen, B.S., Wazniewski, J., Gustavson., F.G.: A recursive formulation of Cholesky factorization of a matrix in packed storage. ACM Trans. Math. Soft. 27(2), 214–244 (2001)

    Article  MATH  Google Scholar 

  3. Anderson, E.: LAPACK3E (2003), http://www.netlib.org/lapack3e

  4. Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bailey, D., Demmel, J., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S., Kapur, A., Li, X., Martin, M., Thompson, B., Tung, T., Yoo, D.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)

    Article  Google Scholar 

  6. Barker, V., Blackford, S., Dongarra, J., Du Croz, J., Hammarling, S., Marinova, M., Wasniewski, J., Yalamov, P.: LAPACK95 Users’ Guide. SIAM (2001), http://www.netlib.org/lapack95

  7. Barlow, J., Bosner, N., Drmač, Z.: A new stable bidiagonal reduction algorithm (2004), www.cse.psu.edu/~barlow/fastbidiag3.ps

  8. Benner, P., Mehrmann, V., Sima, V., Van Huffel, S., Varga, A.: SLICOT - a subroutine library in systems and control theory. Applied and Computational Control, Signals, and Circuits 1, 499–539 (1999)

    Google Scholar 

  9. Bientinisi, P., Dhillon, I.S., van de Geijn, R.: A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations. Technial Report TR-03-26, Computer Science Dept., University of Texas (2003)

    Google Scholar 

  10. Bini, D., Eidelman, Y., Gemignani, L., Gohberg, I.: Fast QR algorithms for Hessenberg matrices which are rank-1 perturbations of unitary matrices. Dept. of Mathematics report 1587, University of Pisa, Italy (2005), http://www.dm.unipi.it/~gemignani/papers/begg.ps

  11. Bischof, C.H., Lang, B., Sun, X.: A framework for symmetric band reduction. ACM Trans. Math. Soft. 26(4), 581–601 (2000)

    Article  MathSciNet  Google Scholar 

  12. Blackford, L.S., Choi, J., Cleary, A., Demmel, J., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D.W., Whaley, R.C.: Scalapack prototype software. Netlib, Oak Ridge National Laboratory (1997)

    Google Scholar 

  13. Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C.: An updated set of Basic Linear Algebra Subroutines (BLAS). ACM Trans. Math. Soft., 28(2) (June 2002)

    Google Scholar 

  14. Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C., Maany, Z., Krough, F., Corliss, G., Hu, C., Keafott, B., Walster, W., Gudenberg, J.W.v.: Basic Linear Algebra Subprograms Techical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4) (2001)

    Google Scholar 

  15. Blackford, S., Corliss, G., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Hu, C., Kahan, W., Kaufman, L., Kearfott, B., Krogh, F., Li, X., Maany, Z., Petitet, A., Pozo, R., Remington, K., Walster, W., Whaley, C., Gudenberg, J.W.v., Lumsdaine, A.: Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4), 305 (2001), also available at www.netlib.org/blas/blast-forum/

    Google Scholar 

  16. Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part I: Maintaining well-focused shifts and Level 3 performance. SIAM J. Matrix Anal. Appl. 23(4), 929–947 (2001)

    Article  MathSciNet  Google Scholar 

  17. Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part II: Aggressive early deflation. SIAM J. Matrix Anal. Appl. 23(4), 948–973 (2001)

    Article  MathSciNet  Google Scholar 

  18. Callahan, D., Chamberlain, B., Zima, H.: The Cascade high-productivity language. In: 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pp. 52–60. IEEE Computer Society Press, Los Alamitos (2004), www.gwu.edu/~upc/publications/productivity.pdf

    Chapter  Google Scholar 

  19. Cantonnet, F., Yao, Y., Zahran, M., El-Ghazawi, T.: Productivity analysis of the UPC language. In: IPDPS 2004 PMEO workshop (2004), www.gwu.edu/~upc/publications/productivity.pdf

  20. Chandrasekaran, S., Gu, M.: Fast and stable algorithms for banded plus semiseparable systems of linear equations. SIAM J. Matrix Anal. Appl. 25(2), 373–384 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  21. CLAPACK: LAPACK in C, http://www.netlib.org/clapack/

  22. Coarfa, C., Dotsenko, Y., Mellor-Crummey, J., Chavarria-Miranda, D., Contonnet, F., El-Ghazawi, T., Mohanti, A., Yao, Y.: An evaluation of global address space languages: Co-Array Fortran and Unified Parallel C. In: Proc. 10th ACM SIGPLAN Symp. on Principles and Practice and Parallel Programming (PPoPP 2005), ACM Press, New York (2005), www.hipersoft.rice.edu/caf/publications/index.html

    Google Scholar 

  23. Davies, P., Higham, N.J.: A Schur-Parlett algorithm for computing matrix functions. SIAM J. Matrix Anal. Appl. 25(2), 464–485 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  24. Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. ACM TOMS 32(2), 325–351 (2006)

    Article  MathSciNet  Google Scholar 

  25. Dhillon, I.S.: Reliable computation of the condition number of a tridiagonal matrix in O(n) time. SIAM J. Matrix Anal. Appl. 19(3), 776–796 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  26. Dongarra, J., Bunch, J., Moler, C., Stewart, G.W.: LINPACK User’s Guide. SIAM, Philadelphia, PA (1979)

    Google Scholar 

  27. Dongarra, J., D’Azevedo, E.: The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines. Computer Science Dept. Technical Report CS-97-347, University of Tennessee, Knoxville, TN (January 1997), http://www.netlib.org/lapack/lawns/lawn118.ps

  28. Dongarra, J., Hammarling, S., Walker, D.: Key concepts for parallel out-of-core LU factorization. Computer Science Dept. Technical Report CS-96-324, University of Tennessee, Knoxville, TN (April 1996), www.netlib.org/lapack/lawns/lawn110.ps

  29. Dongarra, J., Pozo, R., Walker, D.: Lapack++: A design overview of ovject-oriented extensions for high performance linear algebra. In: Supercomputing 1993, IEEE Computer Society Press, Los Alamitos (1993), math.nist.gov/lapack++

    Google Scholar 

  30. Dongarra, J.J., Duff, I.S., Sorensen, D.C., van der Vorst, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM, Philadelphia, PA (1998)

    Google Scholar 

  31. Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: past, present and future. Concurrency Computat.: Pract. Exper. 15, 803–820 (2003)

    Article  Google Scholar 

  32. Dopico, F.M., Molera, J.M., Moro, J.: An orthogonal high relative accuracy algorithm for the symmetric eigenproblem. SIAM. J. Matrix Anal. Appl. 25(2), 301–351 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  33. Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm. Technical report, Dept. of Mathematics, University of Zagreb (2004)

    Google Scholar 

  34. Duff, I.S., Vömel, C.: Incremental Norm Estimation for Dense and Sparse Matrices. BIT 42(2), 300–322 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  35. Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1), 3–45 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  36. f2c: Fortran-to-C translator, http://www.netlib.org/f2c

  37. Fulton, C., Howell, G., Demmel, J., Hammarling, S.: Cache-efficient bidiagonalization using BLAS 2.5 operators, p. 28 (2004) (in progress)

    Google Scholar 

  38. Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  39. Graham, S., Snir, M., Patterson, C. (eds.): Getting up to Speed: The Future of Supercomputing. National Research Council (2005)

    Google Scholar 

  40. Granat, R., Jonsson, I., Kågström, B.: Combining Explicit and Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations in Distrubuted Memory Platforms. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 742–750. Springer, Heidelberg (2004)

    Google Scholar 

  41. Grosser, B.: Ein paralleler und hochgenauer O(n 2) Algorithmus für die bidiagonale Singulärwertzerlegung. PhD thesis, University of Wuppertal, Wuppertal, Germany (2001)

    Google Scholar 

  42. Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Soft. 27(4), 422–455 (2001)

    Article  MATH  Google Scholar 

  43. Hargreaves, G.I.: Computing the condition number of tridiagonal and diagonal-plus-semiseparable matrices in linear time. Technical Report submitted, Department of Mathematics, University of Manchester, Manchester, England (2004)

    Google Scholar 

  44. Higham, N.J.: Analysis of the Cholesky decomposition of a semi-definite matrix. In: Cox, M.G., Hammarling, S. (eds.) Reliable Numerical Computation. ch. 9, pp. 161–186. Clarendon Press, Oxford (1990)

    Google Scholar 

  45. High productivity computing systems (hpcs), http://www.highproductivity.org

  46. IEEE Standard for Binary Floating Point Arithmetic Revision (2002), grouper.ieee.org/groups/754

  47. JLAPACK: LAPACK in Java, http://icl.cs.utk.edu/f2j

  48. Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. I. one-sided and coupled Sylvester-type matrix equations. ACM Trans. Math. Software 28(4), 392–415 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  49. Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. II. Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Trans. Math. Software 28(4), 416–435 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  50. Kågström, B., Kressner, D.: Multishift Variants of the QZ Algorithm with Aggressive Early Deflation. SIAM J. Matrix Anal. Appl. 29(1), 199–227 (2006)

    Article  MathSciNet  Google Scholar 

  51. LAPACK Contributor Webpage, http://www.netlib.org/lapack-dev/contributions.html

  52. Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)

    Article  Google Scholar 

  53. Menon, V., Pingali, K.: Look left, look right, look left again: An application of fractal symbolic analysis to linear algebra code restructuring. Int. J. Parallel Comput. 32(6), 501–523 (2004)

    Article  MATH  Google Scholar 

  54. Nishtala, R., Chakrabarti, K., Patel, N., Sanghavi, K., Demmel, J., Yelick, K., Brewer, E.: Automatic tuning of collective communications in MPI. In: Poster at SIAM Conf. on Parallel Proc., San Francisco, www.cs.berkeley.edu/~rajeshn/poster_draft_6.ppt

  55. Numrich, R., Reid, J.: Co-array Fortran for parallel programming. Fortran Forum, 17 (1998)

    Google Scholar 

  56. OSKI: Optimized Sparse Kernel Interface, http://bebop.cs.berkeley.edu/oski/

  57. Parlett, B.N., Dhillon, I.S.: Orthogonal eigenvectors and relative gaps. SIAM J. Matrix Anal. Appl. 25(3), 858–899 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  58. Parlett, B.N., Vömel, C.: Tight clusters of glued matrices and the shortcomings of computing orthogonal eigenvectors by multiple relatively robust representations. University of California, Berkeley, 2004 (in preparation)

    Google Scholar 

  59. Ralha, R.: One-sided reduction to bidiagonal form. Lin. Alg. Appl. 358, 219–238 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  60. Saraswat, V.: Report on the experimental language X10, v0.41. IBM Research technical report (2005)

    Google Scholar 

  61. Slapničar, I.: Highly accurate symmetric eigenvalue decomposition and hyperbolic SVD. Lin. Alg. Appl. 358, 387–424 (2002)

    Article  Google Scholar 

  62. Strazdins, P.E.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Int. J. Parallel Distrib. Systems Networks 4(1), 26–35 (2001)

    Google Scholar 

  63. Tisseur, F., Meerbergen, K.: A survey of the quadratic eigenvalue problem. SIAM Review 43, 234–286 (2001)

    Article  MathSciNet  Google Scholar 

  64. TNT: Template Numerical Toolkit, http://math.nist.gov/tnt

  65. Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Towards an accurate model for collective communications. Intern. J. High Perf. Comp. Appl., special issue on Performance Tuning 18(1), 159–167 (2004)

    Article  Google Scholar 

  66. Vandebril, R., Van Barel, M., Mastronardi, M.: An implicit QR algorithm for semiseparable matrices to compute the eigendecomposition of symmetric matrices. Report TW 367, Department of Computer Science, K.U. Leuven, Leuven, Belgium (2003)

    Google Scholar 

  67. Vuduc, R., Demmel, J., Bilmes, J.: Statistical models for automatic performance tuning. In: Intern. Conf. Comput. Science (May 2001)

    Google Scholar 

  68. Whaley, R.C., Dongarra, J.: The ATLAS WWW home page, http://www.netlib.org/atlas/

  69. Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–25 (2001)

    Article  MATH  Google Scholar 

  70. Willems, P.: personal communication (2006)

    Google Scholar 

  71. Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., Aiken, A.: Titanium: A high-performnace Java dialect. Concurrency: Practice and Experience 10, 825–836 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Demmel, J.W. et al. (2007). Prospectus for the Next LAPACK and ScaLAPACK Libraries. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75755-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75754-2

  • Online ISBN: 978-3-540-75755-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics