Skip to main content

Parallel Factorization Algorithms with Algorithmic Blocking

  • Chapter
Parallel Numerical Computation with Applications

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 515))

  • 114 Accesses

Abstract

LU and QR factorizations are the most widely used method for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Since each parallel computer has very different performance ratios of computation and communication, the optimal computational block sizes, which generate the maximum performance of an algorithm, are different from one another. Therefore, the data matrix must be distributed with the machine specific optimal block size before the computation. Too small or large block size makes getting good performance on a machine near impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix.

In this chapter, we present parallel LU and QR factorization algorithms with an “algorithmic blocking” strategy on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible obtaining the best performance irrespective of the physical block size. The algorithms are implemented and compared with the ScaLAPACK factorization routines on the Intel Paragon computer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R. C., Gustayson, F. G., and Zubair, M. (1994). A High-Performance Matrix-Multiplication Algorithm on a Distributed-Memory Parallel Computer Using Overlapped Communication. IBM Journal of Research and Development, 38(6):673–681.

    Article  Google Scholar 

  2. Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. (1990). LAPACK: A Portable Linear Algebra Library for High-Performance Computers. In Proceedings of Supercomputing ‘80, pages 1–10. IEEE Press.

    Google Scholar 

  3. Bangalore, P. V. (1995). The Data-Distribution-Independent Approach to Scalable Parallel Libraries. Master Thesis, Mississippi State University.

    Google Scholar 

  4. Blackford, L., Choi, J., Cleary, A., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. (1997a). ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance. In Proceedings of SIAM Conference on Parallel Processing.

    Google Scholar 

  5. Blackford, L., Choi, J., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. (1997b). ScaLAPACK Users’ Guide. SIAM Press, Philadelphia, PA.

    Book  MATH  Google Scholar 

  6. Choi, J. (1998). A New Parallel Univeral Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers. Concurrency: Practice and Experience, 10:655–670.

    Article  MATH  Google Scholar 

  7. Choi, J., Dongarra, J. J., Ostrouchov, S., Petitet, A. P., Walker, D. W., and Whaley, R. C. (1995). A Proposal for a Set of Parallel Basic Linear Algebra Subprograms. LAPACK Working Note 100, Technical Report CS-95–292, University of Tennessee.

    Google Scholar 

  8. Choi, J., Dongarra, J. J., Ostrouchov, S., Petitet, A. P., Walker, D. W., and Whaley, R. C. (1996). The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines. Scientific Programming, 5:173–184.

    Google Scholar 

  9. Choi, J., Dongarra, J. J., and Walker, D. W. (1992). The Design of Scalable Software Libraries for Distributed Memory Concurrent Computers. In Proceedings of Environment and Tools for Parallel Scientific Computing Workshop, (Saint Hilaire du Touvet,France), pages 3–15. Elsevier Science Publishers.

    Google Scholar 

  10. Choi, J., Dongarra, J. J., and Walker, D. W. (1994). PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers. Concurrency: Practice and Experience, 6:543–570.

    Article  Google Scholar 

  11. Dongarra, J. J. and Ostrouchov, S. (1990). LAPACK Block Factorization Algorithms on the Intel iPSC/860. LAPACK Working Note 24, Technical Report CS-90–115, University of Tennessee.

    Google Scholar 

  12. Golub, G. H. and Loan, C. V. V. (1989). Matrix Computations. The Johns Hopkins University Press, Baltimore, MD. Second Edition.

    MATH  Google Scholar 

  13. Huss-Lederman, S., Jacobson, E. M., Tsao, A., and Zhang, G. (1994). Matrix Multiplication on the Intel Touchstone Delta. Concurrency: Practice and Experience, 6:571–594.

    Article  Google Scholar 

  14. Kumar, V., Grama, A., Gupta, A., and Karypis, G. (1994). Introduction to Parallel Computing. The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA.

    MATH  Google Scholar 

  15. Li, G. and Coleman, T. F. (1986). A Parallel Triangular Solver for a Distributed-Memory Multiprocessor. SIAM J. of Sci. Stat. Computing, 9:485–502.

    Article  MathSciNet  Google Scholar 

  16. Lichtenstein, W. and Johnsson, S. L. (1993). Block-Cyclic Dense Linear Algebra. SIAM J. of Sci. Stat. Computing, 14(6): 1259–1288.

    Article  MathSciNet  MATH  Google Scholar 

  17. Petitet, A. (1996). Algorithmic Redistribution Methods for Block Cyclic Decompositions. Ph.D. Thesis, University of Tennessee, Knoxville.

    Google Scholar 

  18. van de Geijn, R. and Watts, J. (1995). SUMMA Scalable Universal Matrix Multiplication Algorithm. LAPACK Working Note 99, Technical Report CS-95–286, University of Tennessee.

    Google Scholar 

  19. van de Geijn, R. A. (1997). Using PLAPACK. The MIT Press, Cambridge.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Tianruo Yang

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media New York

About this chapter

Cite this chapter

Choi, J. (1999). Parallel Factorization Algorithms with Algorithmic Blocking. In: Yang, T. (eds) Parallel Numerical Computation with Applications. The Springer International Series in Engineering and Computer Science, vol 515. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5205-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5205-5_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7371-1

  • Online ISBN: 978-1-4615-5205-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics