# Predictable Parallel Performance: The BSP Model

## Abstract

There are three big challenges for mainstream parallel computing: building useful hardware platforms, designing programming models that are effective, and designing a software construction process that builds correctness into software. The first has largely been solved, at least for current technology. The second has been an active area of research for perhaps fifteen years, while work on the third has barely begun. In this chapter, we describe the Bulk Synchronous Parallel (BSP) model which, at present, represents the best compromise among programming models for simplicity, predictability, and performance. We describe the model from the a software developer’s perspective and show how its high-level structure is used to build efficient implementations. Almost alone among programming models, BSP has an associated cost model so that the performance of programs can be predicted on any target without laborious benchmarking. Some progress towards software construction has also been made in the context of BSP.

## Keywords

BSP cluster computing predictable performance portability synchronism## Preview

Unable to display preview. Download preview PDF.

## References

- [1]D.J. Becker, T. Sterling, D. Savarese, J. E. Dorbandi, U.A. Ranawak, and C.V. Packer. Beowulf: A parallel workstation for scientific computation. In
*Proceedings of the International Conference on Parallel Processing (ICPP)*, pages 11–14, 1995.Google Scholar - [2]Rob H. Bisseling. Basic techniques for numerical linear algebra on bulk synchronous parallel computers. In Lubin Vulkov, Jerzy Waśniewski, and Plamen Yalamov, editors,
*Workshop Numerical Analysis and its Applications 1996*, volume 1196 of*Lecture Notes in Computer Science*, pages 46–57. Springer-Verlag, Berlin, 1997.CrossRefGoogle Scholar - [3]L. Breiman. Bagging predictors.
*Machine Learning*, 24:123–140, 1996.MathSciNetzbMATHGoogle Scholar - [4]Philip Buonadonna, Andrew Geweke, and David E. Cuiler. Implementation and analysis of the Virtual Interface Architecture. In
*SuperComputing’98*, 1998.Google Scholar - [5]Radu Calinescu. Conservative discrete-event simulations on bulk synchronous parallel architectures. Technical Report TR-16–95, Oxford University Computing Laboratory, 1995.Google Scholar
- [6]Giuseppe Ciacco. Optimal communication performance on Fast Ethernet with GAMMA. In
*Parallel and Distributed Processing*, volume 1388 of*Lecture Notes in Computer Science*, pages 534–548. Springer, 1998.CrossRefGoogle Scholar - [7]P.I. Crumpton and M.B. Giles. Multigrid aircraft computations using the OPlus parallel library. In
*Parallel Computational Fluid Dynamics: Implementation and Results using Parallel Computers. Proceedings Parallel CFD ’95*, pages 339–346, Pasadena, CA, USA, June 1995. Elsevier/NorthHolland.Google Scholar - [8]M. Danelutto, F. Pasqualetti, and S. Pelagatti. Skeletons for data parallelism in p31. In C. Lengauer, M. Griebl, and S. Gorlatch, editors,
*Proc. of EURO-PAR ’97, Passau, Germany*, volume 1300 of*LNCS*, pages 619–628. Springer-Verlag, August 1997.CrossRefGoogle Scholar - [9]H.G. Dietz, T. Muhammad, J.B. Sponaugle, and T. Mattox. Papers: Purdue’s adapter for parallel execution and rapid synchronization. Technical Report TR-EE-94–11, Purdue School of Electrical Engineering, March 1994.Google Scholar
- [10]S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. Exploiting global structure for performance in clusters. In
*Proceedings of IPPS/SPDP’99*, pages 176–182. IEEE Computer Society Press, 1999.Google Scholar - [11]S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. Performance results for a reliable low-latency cluster communication protocol. In
*PCNOW’99, Workshop at IPPS/SPDP’99*, number 1586 in Lecture Notes in Computer Science, pages 1097–1114. Springer-Verlag, April 1999.Google Scholar - [12]S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. BSP clusters: Highperformance, reliable, and very low cost.
*Parallel Computing*, 26(2–3):199–242, February 2000.zbMATHCrossRefGoogle Scholar - [13]Stephen R. Donaldson, Jonathan M.D. Hill, and David B. Skillicorn. Predictable communication on unpredictable networks: Implementing BSP over TCP/IP. In
*Europar’98*, number 1470 in Springer Lecture Notes in Computer Science, pages 970–980, September 1998.Google Scholar - [14]M. Goudreau, K. Lang, S. Rao, T. Suel, and T. Tsantilas. Towards efficiency and portability: Programming the BSP model. In
*Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures*, pages 1–12, June 1996.Google Scholar - [15]W. W. Gropp and E. L. Lusk. A taxonomy of programming models for symmetric multiprocessors and SMP clusters. In
*Programming Models for Massively Parallel Computers*, pages 2–7, October 1995.CrossRefGoogle Scholar - [16]J.M.D. Hill and D.B. Skillicorn. Lessons learned from implementing BSP.
*Future’Generation Computer Systems*, 13(4–5):327–335, April 1998.Google Scholar - [17]J.M.D. Hill and D.B. Skillicorn. Practical barrier synchronisation. In
*6th Euromicro Workshop on Parallel and Distributed Processing (PDP’98)*, pages 438–444, Barcelona, Spain, January 1998. IEEE Computer Society Press.Google Scholar - [18]Jonathan M D Hill, Paul I Crumpton, and David A Burgess. The theory, practice, and a tool for BSP performance prediction applied to a CFD application. Technical Report TR-4–96, Programming Research Group, Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford, England. OX 1 3QD, February 1996.Google Scholar
- [19]Jonathan M. D. Hill, Bill McColl, Dan C. Stefanescu, Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, and Rob H. Bisseling. BSPlib: The BSP programming library.
*Parallel Computing*, 24(14):1947–1980, December 1998.CrossRefGoogle Scholar - [20]C.A.R. Hoare and J. He.
*Unified Theories of Programming*. Prentice-Hall International, 1998.Google Scholar - [21]P. Hudak. The conception, evaluation, and application of functional programming.
*ACM Computing Surveys*, 21(3):359–411, 1989.CrossRefGoogle Scholar - [22]P. Kelly.
*Functional Programming for Loosely-Coupled Multiprocessors*. Pitman, 1989.zbMATHGoogle Scholar - [23]D. Lecomber.
*Methods of BSP Programming*. PhD thesis, Oxford University Computing Laboratory, 1998.Google Scholar - [24]W. F. McColl. General purpose parallel computing. In A. M. Gibbons and P. Spirakis, editors,
*Lectures on Parallel Computation*, Cambridge International Series on Parallel Computation, pages 337–391. Cambridge University Press, 1993.Google Scholar - [25]P.B. Monk, A.K. Parrott, and P.J. Wesson. A parallel finite element method for electromagnetic scattering.
*COMPEL*, 13, Supp.A:237–242, 1994.Google Scholar - [26]M. Nibhanupudi, C. Norton, and B. Szymanski. Plasma simulation on networks of workstations using the bulk synchronous parallel model. In
*Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications*, Athens, GA, November 1995.Google Scholar - [27]S.L. Peyton-Jones and David Lester.
*Implementing Functional Programming Languages*. Prentice-Hall International Series in Computer Science, 1992.Google Scholar - [28]Loic Prylli and Bernard Tourancheau. A new protocol designed for high performance networking on Myrinet. In
*Parallel and Distributed Processing*, volume 1388 of*Lecture Notes in Computer Science*, pages 472–485. Springer, 1998.CrossRefGoogle Scholar - [29]R.O. Rogers and D.B. Skillicorn. Using the BSP cost model for optimal parallel neural network training.
*Future Generation Computer Systems*, 14:409–424, 1998.CrossRefGoogle Scholar - [30]R.O. Rogers and D.B. Skillicorn. Using the BSP cost model to optimize parallel neural network training.
*Future Generation Computer Systems*, 14:409–424, 1998.CrossRefGoogle Scholar - [31]Constantinos Siniolakis. Bulk-synchronous parallel algorithms in computational geometry. Technical Report PRG-TR-10–96, Oxford University, Computing Laboratory, May 1996.Google Scholar
- [32]D. Skillicorn. Strategies for parallel data mining.
*IEEE Concurrency*, 7(4):26–35, October-December 1999.CrossRefGoogle Scholar - [33]D.B. Skillicorn. Architecture-independent parallel computation.
*IEEE Computer*, 23(12):38–51, December 1990.CrossRefGoogle Scholar - [34]D.B. Skillicorn. Structuring data parallelism using categorical data types. In
*Programming Models for Massively Parallel Computers*, pages 110–115, Berlin, September 1993. Computer Society Press.CrossRefGoogle Scholar - [35]D.B. Skillicorn.
*Foundations of Parallel Programming*. Number 6 in Cambridge Series in Parallel Computation. Cambridge University Press, 1994.CrossRefGoogle Scholar - [36]D.B. Skillicorn. Building BSP programs using the Refinement Calculus. In
*Third International Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA’98)*, Springer Lecture Notes in Computer Science 1388, pages 790–795, March/April 1998.Google Scholar - [37]D.B. Skillicorn. Parallel predictor generation. In
*Proceedings of a Workshop on Large-Scale Parallel KDD Systems, KDD’99*, number 1759 in Lecture Notes in Artificial Intelligence, pages 190–196. Springer-Verlag, 2000.Google Scholar - [38]D.B. Skillicorn, J.M.D. Hill, and W.F. McColl. Questions and answers about BSP.
*Scientific Programming*, 6(3):249–274, 1997.Google Scholar - [39]D.B. Skillicorn and D. Talia. Models and programming languages for parallel computation.
*Computing Surveys*, 30(2):123–169, June 1998.CrossRefGoogle Scholar - [40]L.G. Valiant. A bridging model for parallel computation.
*Communications of the ACM*, 33(8):103–111, August 1990.CrossRefGoogle Scholar - [41]Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active Messages: A mechanism for integrated communication and computation. In
*The 19th Annual International Symposium on Computer Architecture*, volume 20(2) of*ACM SIGARCH Computer Architecture News*. ACM Press, May 1992.Google Scholar - [42]Y. Wang and D.B. Skillicorn. Parallel inductive logic for data mining. In
*Workshop on Distributed and Parallel Knowledge Discovery, KDD 2000*, Boston, to appear. ACM Press.Google Scholar - [43]M. Zaki. Parallel and distributed data mining: A survey.
*IEEE Concurrency*, 7(4):14–25, October—December 1999.CrossRefGoogle Scholar