Skip to main content

Basic techniques for numerical linear algebra on bulk synchronous parallel computers

  • Conference paper
  • First Online:
Numerical Analysis and Its Applications (WNAA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1196))

Included in the following conference series:

Abstract

The bulk synchronous parallel (BSP) model promises scalable and portable software for a wide range of applications. A BSP computer consists of several processors, each with private memory, and a communication network that delivers access to remote memory in uniform time.

Numerical linear algebra computations can benefit from the BSP model, both in terms of simplicity and efficiency. Dense LU decomposition and other computations can be made more efficient by using the new technique of two-phase randomised broadcasting, which is motivated by a cost analysis in the BSP model. For LU decomposition with partial pivoting, this technique reduces the communication time by a factor of (√p+1)/3, where p is the number of processors.

Theoretical analysis, together with benchmark values for machine parameters, can be used to predict execution time. Such predictions are verified by numerical experiments on a 64-processor Cray T3D. The experimental results confirm the advantage of two-phase randomised broadcasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. H. Bisseling and L. D. J. C. Loyens. Towards peak parallel LINPACK performance on 400 transputers. Supercomputer, 45:20–27, 1991.

    Google Scholar 

  2. R. H. Bisseling and W. F. McColl. Scientific computing on bulk synchronous parallel architectures. In B. Pehrson and I. Simon, editors, Technology and Foundations: Information Processing '94, Vol. I, volume 51 of IFIP Transactions A, pages 509–514. Elsevier Science Publishers, Amsterdam, 1994.

    Google Scholar 

  3. R. H. Bisseling and J. G. G. van de Vorst. Parallel LU decomposition on a transputer network. In G. A. van Zee and J. G. G. van de Vorst, editors, Parallel Computing 1988, volume 384 of Lecture Notes in Computer Science, pages 61–77. Springer-Verlag, Berlin, 1989.

    Google Scholar 

  4. J. J. Dongarra and D. W. Walker. Software libraries for linear algebra computations on high performance computers. SIAM Review, 37(2):151–180, 1995.

    Google Scholar 

  5. G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon, and D. W. Walker. Solving Problems on Concurrent Processors: Vol. I, General Techniques and Regular Problems. Prentice Hall, Englewood Cliffs, NJ, 1988.

    Google Scholar 

  6. A. V. Gerbessiotis and L. G. Valiant. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing, 22(2):251–267, 1994.

    Google Scholar 

  7. M. W. Goudreau, J. M. D. Hill, K. Lang, B. McColl, S. B. Rao, D. C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for the BSP Worldwide standard library. Technical report, Oxford Parallel, Oxford, UK, Apr. 1996.

    Google Scholar 

  8. B. A. Hendrickson and D. E. Womble. The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM Journal on Scientific Computing, 15(5):1201–1226, 1994.

    Google Scholar 

  9. B. H. H. Juurlink and H. A. G. Wijshoff. Communication primitives for BSP computers. Information Processing Letters, to appear, 1996.

    Google Scholar 

  10. W. F. McColl. Scalable computing. In J. van Leeuwen, editor, Computer Science Today: Recent Trends and Developments, volume 1000 of Lecture Notes in Computer Science, pages 46–61. Springer-Verlag, Berlin, 1995.

    Google Scholar 

  11. R. Miller. A library for bulk synchronous parallel programming. In General Purpose Parallel Computing, pages 100–108. British Computer Society Parallel Processing Specialist Group, 1993.

    Google Scholar 

  12. D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28(8):840–853, 1985.

    Google Scholar 

  13. P. Timmers. Implementing dense Cholesky factorization on a BSP computer. Master's thesis, Department of Mathematics, Utrecht University, Utrecht, the Netherlands, June 1994.

    Google Scholar 

  14. L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982.

    Google Scholar 

  15. L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Lubin Vulkov Jerzy Waśniewski Plamen Yalamov

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bisseling, R.H. (1997). Basic techniques for numerical linear algebra on bulk synchronous parallel computers. In: Vulkov, L., Waśniewski, J., Yalamov, P. (eds) Numerical Analysis and Its Applications. WNAA 1996. Lecture Notes in Computer Science, vol 1196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62598-4_78

Download citation

  • DOI: https://doi.org/10.1007/3-540-62598-4_78

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62598-8

  • Online ISBN: 978-3-540-68326-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics