Basic techniques for numerical linear algebra on bulk synchronous parallel computers

Bisseling, Rob H.

doi:10.1007/3-540-62598-4_78

Rob H. Bisseling¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1196))

Included in the following conference series:

International Workshop on Numerical Analysis and Its Applications

262 Accesses
12 Citations

Abstract

The bulk synchronous parallel (BSP) model promises scalable and portable software for a wide range of applications. A BSP computer consists of several processors, each with private memory, and a communication network that delivers access to remote memory in uniform time.

Numerical linear algebra computations can benefit from the BSP model, both in terms of simplicity and efficiency. Dense LU decomposition and other computations can be made more efficient by using the new technique of two-phase randomised broadcasting, which is motivated by a cost analysis in the BSP model. For LU decomposition with partial pivoting, this technique reduces the communication time by a factor of (√p+1)/3, where p is the number of processors.

Theoretical analysis, together with benchmark values for machine parameters, can be used to predict execution time. Such predictions are verified by numerical experiments on a 64-processor Cray T3D. The experimental results confirm the advantage of two-phase randomised broadcasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. H. Bisseling and L. D. J. C. Loyens. Towards peak parallel LINPACK performance on 400 transputers. Supercomputer, 45:20–27, 1991.
Google Scholar
R. H. Bisseling and W. F. McColl. Scientific computing on bulk synchronous parallel architectures. In B. Pehrson and I. Simon, editors, Technology and Foundations: Information Processing '94, Vol. I, volume 51 of IFIP Transactions A, pages 509–514. Elsevier Science Publishers, Amsterdam, 1994.
Google Scholar
R. H. Bisseling and J. G. G. van de Vorst. Parallel LU decomposition on a transputer network. In G. A. van Zee and J. G. G. van de Vorst, editors, Parallel Computing 1988, volume 384 of Lecture Notes in Computer Science, pages 61–77. Springer-Verlag, Berlin, 1989.
Google Scholar
J. J. Dongarra and D. W. Walker. Software libraries for linear algebra computations on high performance computers. SIAM Review, 37(2):151–180, 1995.
Google Scholar
G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon, and D. W. Walker. Solving Problems on Concurrent Processors: Vol. I, General Techniques and Regular Problems. Prentice Hall, Englewood Cliffs, NJ, 1988.
Google Scholar
A. V. Gerbessiotis and L. G. Valiant. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing, 22(2):251–267, 1994.
Google Scholar
M. W. Goudreau, J. M. D. Hill, K. Lang, B. McColl, S. B. Rao, D. C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for the BSP Worldwide standard library. Technical report, Oxford Parallel, Oxford, UK, Apr. 1996.
Google Scholar
B. A. Hendrickson and D. E. Womble. The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM Journal on Scientific Computing, 15(5):1201–1226, 1994.
Google Scholar
B. H. H. Juurlink and H. A. G. Wijshoff. Communication primitives for BSP computers. Information Processing Letters, to appear, 1996.
Google Scholar
W. F. McColl. Scalable computing. In J. van Leeuwen, editor, Computer Science Today: Recent Trends and Developments, volume 1000 of Lecture Notes in Computer Science, pages 46–61. Springer-Verlag, Berlin, 1995.
Google Scholar
R. Miller. A library for bulk synchronous parallel programming. In General Purpose Parallel Computing, pages 100–108. British Computer Society Parallel Processing Specialist Group, 1993.
Google Scholar
D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28(8):840–853, 1985.
Google Scholar
P. Timmers. Implementing dense Cholesky factorization on a BSP computer. Master's thesis, Department of Mathematics, Utrecht University, Utrecht, the Netherlands, June 1994.
Google Scholar
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982.
Google Scholar
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Utrecht University, P. O. Box 80010, 3508, TA Utrecht, the Netherlands
Rob H. Bisseling

Authors

Rob H. Bisseling
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Lubin Vulkov Jerzy Waśniewski Plamen Yalamov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bisseling, R.H. (1997). Basic techniques for numerical linear algebra on bulk synchronous parallel computers. In: Vulkov, L., Waśniewski, J., Yalamov, P. (eds) Numerical Analysis and Its Applications. WNAA 1996. Lecture Notes in Computer Science, vol 1196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62598-4_78

Download citation

DOI: https://doi.org/10.1007/3-540-62598-4_78
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62598-8
Online ISBN: 978-3-540-68326-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics