Parallel Factorization Algorithms with Algorithmic Blocking

Choi, Jaeyoung

doi:10.1007/978-1-4615-5205-5_2

Jaeyoung Choi¹

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 515))

114 Accesses

Abstract

LU and QR factorizations are the most widely used method for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Since each parallel computer has very different performance ratios of computation and communication, the optimal computational block sizes, which generate the maximum performance of an algorithm, are different from one another. Therefore, the data matrix must be distributed with the machine specific optimal block size before the computation. Too small or large block size makes getting good performance on a machine near impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix.

In this chapter, we present parallel LU and QR factorization algorithms with an “algorithmic blocking” strategy on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible obtaining the best performance irrespective of the physical block size. The algorithms are implemented and compared with the ScaLAPACK factorization routines on the Intel Paragon computer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, R. C., Gustayson, F. G., and Zubair, M. (1994). A High-Performance Matrix-Multiplication Algorithm on a Distributed-Memory Parallel Computer Using Overlapped Communication. IBM Journal of Research and Development, 38(6):673–681.
Article Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. (1990). LAPACK: A Portable Linear Algebra Library for High-Performance Computers. In Proceedings of Supercomputing ‘80, pages 1–10. IEEE Press.
Google Scholar
Bangalore, P. V. (1995). The Data-Distribution-Independent Approach to Scalable Parallel Libraries. Master Thesis, Mississippi State University.
Google Scholar
Blackford, L., Choi, J., Cleary, A., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. (1997a). ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance. In Proceedings of SIAM Conference on Parallel Processing.
Google Scholar
Blackford, L., Choi, J., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. (1997b). ScaLAPACK Users’ Guide. SIAM Press, Philadelphia, PA.
Book MATH Google Scholar
Choi, J. (1998). A New Parallel Univeral Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers. Concurrency: Practice and Experience, 10:655–670.
Article MATH Google Scholar
Choi, J., Dongarra, J. J., Ostrouchov, S., Petitet, A. P., Walker, D. W., and Whaley, R. C. (1995). A Proposal for a Set of Parallel Basic Linear Algebra Subprograms. LAPACK Working Note 100, Technical Report CS-95–292, University of Tennessee.
Google Scholar
Choi, J., Dongarra, J. J., Ostrouchov, S., Petitet, A. P., Walker, D. W., and Whaley, R. C. (1996). The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines. Scientific Programming, 5:173–184.
Google Scholar
Choi, J., Dongarra, J. J., and Walker, D. W. (1992). The Design of Scalable Software Libraries for Distributed Memory Concurrent Computers. In Proceedings of Environment and Tools for Parallel Scientific Computing Workshop, (Saint Hilaire du Touvet,France), pages 3–15. Elsevier Science Publishers.
Google Scholar
Choi, J., Dongarra, J. J., and Walker, D. W. (1994). PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers. Concurrency: Practice and Experience, 6:543–570.
Article Google Scholar
Dongarra, J. J. and Ostrouchov, S. (1990). LAPACK Block Factorization Algorithms on the Intel iPSC/860. LAPACK Working Note 24, Technical Report CS-90–115, University of Tennessee.
Google Scholar
Golub, G. H. and Loan, C. V. V. (1989). Matrix Computations. The Johns Hopkins University Press, Baltimore, MD. Second Edition.
MATH Google Scholar
Huss-Lederman, S., Jacobson, E. M., Tsao, A., and Zhang, G. (1994). Matrix Multiplication on the Intel Touchstone Delta. Concurrency: Practice and Experience, 6:571–594.
Article Google Scholar
Kumar, V., Grama, A., Gupta, A., and Karypis, G. (1994). Introduction to Parallel Computing. The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA.
MATH Google Scholar
Li, G. and Coleman, T. F. (1986). A Parallel Triangular Solver for a Distributed-Memory Multiprocessor. SIAM J. of Sci. Stat. Computing, 9:485–502.
Article MathSciNet Google Scholar
Lichtenstein, W. and Johnsson, S. L. (1993). Block-Cyclic Dense Linear Algebra. SIAM J. of Sci. Stat. Computing, 14(6): 1259–1288.
Article MathSciNet MATH Google Scholar
Petitet, A. (1996). Algorithmic Redistribution Methods for Block Cyclic Decompositions. Ph.D. Thesis, University of Tennessee, Knoxville.
Google Scholar
van de Geijn, R. and Watts, J. (1995). SUMMA Scalable Universal Matrix Multiplication Algorithm. LAPACK Working Note 99, Technical Report CS-95–286, University of Tennessee.
Google Scholar
van de Geijn, R. A. (1997). Using PLAPACK. The MIT Press, Cambridge.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Soongsil University, Seoul, 156-743, Korea
Jaeyoung Choi

Authors

Jaeyoung Choi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Tianruo Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Choi, J. (1999). Parallel Factorization Algorithms with Algorithmic Blocking. In: Yang, T. (eds) Parallel Numerical Computation with Applications. The Springer International Series in Engineering and Computer Science, vol 515. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5205-5_2

Download citation

DOI: https://doi.org/10.1007/978-1-4615-5205-5_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7371-1
Online ISBN: 978-1-4615-5205-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics