Predictable Parallel Performance: The BSP Model

  • D. B. Skillicorn
Part of the Applied Optimization book series (APOP, volume 67)


There are three big challenges for mainstream parallel computing: building useful hardware platforms, designing programming models that are effective, and designing a software construction process that builds correctness into software. The first has largely been solved, at least for current technology. The second has been an active area of research for perhaps fifteen years, while work on the third has barely begun. In this chapter, we describe the Bulk Synchronous Parallel (BSP) model which, at present, represents the best compromise among programming models for simplicity, predictability, and performance. We describe the model from the a software developer’s perspective and show how its high-level structure is used to build efficient implementations. Almost alone among programming models, BSP has an associated cost model so that the performance of programs can be predicted on any target without laborious benchmarking. Some progress towards software construction has also been made in the context of BSP.


BSP cluster computing predictable performance portability synchronism 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    D.J. Becker, T. Sterling, D. Savarese, J. E. Dorbandi, U.A. Ranawak, and C.V. Packer. Beowulf: A parallel workstation for scientific computation. In Proceedings of the International Conference on Parallel Processing (ICPP), pages 11–14, 1995.Google Scholar
  2. [2]
    Rob H. Bisseling. Basic techniques for numerical linear algebra on bulk synchronous parallel computers. In Lubin Vulkov, Jerzy Waśniewski, and Plamen Yalamov, editors, Workshop Numerical Analysis and its Applications 1996, volume 1196 of Lecture Notes in Computer Science, pages 46–57. Springer-Verlag, Berlin, 1997.CrossRefGoogle Scholar
  3. [3]
    L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.MathSciNetzbMATHGoogle Scholar
  4. [4]
    Philip Buonadonna, Andrew Geweke, and David E. Cuiler. Implementation and analysis of the Virtual Interface Architecture. In SuperComputing’98, 1998.Google Scholar
  5. [5]
    Radu Calinescu. Conservative discrete-event simulations on bulk synchronous parallel architectures. Technical Report TR-16–95, Oxford University Computing Laboratory, 1995.Google Scholar
  6. [6]
    Giuseppe Ciacco. Optimal communication performance on Fast Ethernet with GAMMA. In Parallel and Distributed Processing, volume 1388 of Lecture Notes in Computer Science, pages 534–548. Springer, 1998.CrossRefGoogle Scholar
  7. [7]
    P.I. Crumpton and M.B. Giles. Multigrid aircraft computations using the OPlus parallel library. In Parallel Computational Fluid Dynamics: Implementation and Results using Parallel Computers. Proceedings Parallel CFD ’95, pages 339–346, Pasadena, CA, USA, June 1995. Elsevier/NorthHolland.Google Scholar
  8. [8]
    M. Danelutto, F. Pasqualetti, and S. Pelagatti. Skeletons for data parallelism in p31. In C. Lengauer, M. Griebl, and S. Gorlatch, editors, Proc. of EURO-PAR ’97, Passau, Germany, volume 1300 of LNCS, pages 619–628. Springer-Verlag, August 1997.CrossRefGoogle Scholar
  9. [9]
    H.G. Dietz, T. Muhammad, J.B. Sponaugle, and T. Mattox. Papers: Purdue’s adapter for parallel execution and rapid synchronization. Technical Report TR-EE-94–11, Purdue School of Electrical Engineering, March 1994.Google Scholar
  10. [10]
    S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. Exploiting global structure for performance in clusters. In Proceedings of IPPS/SPDP’99, pages 176–182. IEEE Computer Society Press, 1999.Google Scholar
  11. [11]
    S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. Performance results for a reliable low-latency cluster communication protocol. In PCNOW’99, Workshop at IPPS/SPDP’99, number 1586 in Lecture Notes in Computer Science, pages 1097–1114. Springer-Verlag, April 1999.Google Scholar
  12. [12]
    S.R. Donaldson, J.M.D. Hill, and D.B. Skillicorn. BSP clusters: Highperformance, reliable, and very low cost. Parallel Computing, 26(2–3):199–242, February 2000.zbMATHCrossRefGoogle Scholar
  13. [13]
    Stephen R. Donaldson, Jonathan M.D. Hill, and David B. Skillicorn. Predictable communication on unpredictable networks: Implementing BSP over TCP/IP. In Europar’98, number 1470 in Springer Lecture Notes in Computer Science, pages 970–980, September 1998.Google Scholar
  14. [14]
    M. Goudreau, K. Lang, S. Rao, T. Suel, and T. Tsantilas. Towards efficiency and portability: Programming the BSP model. In Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures, pages 1–12, June 1996.Google Scholar
  15. [15]
    W. W. Gropp and E. L. Lusk. A taxonomy of programming models for symmetric multiprocessors and SMP clusters. In Programming Models for Massively Parallel Computers, pages 2–7, October 1995.CrossRefGoogle Scholar
  16. [16]
    J.M.D. Hill and D.B. Skillicorn. Lessons learned from implementing BSP. Future’Generation Computer Systems, 13(4–5):327–335, April 1998.Google Scholar
  17. [17]
    J.M.D. Hill and D.B. Skillicorn. Practical barrier synchronisation. In 6th Euromicro Workshop on Parallel and Distributed Processing (PDP’98), pages 438–444, Barcelona, Spain, January 1998. IEEE Computer Society Press.Google Scholar
  18. [18]
    Jonathan M D Hill, Paul I Crumpton, and David A Burgess. The theory, practice, and a tool for BSP performance prediction applied to a CFD application. Technical Report TR-4–96, Programming Research Group, Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford, England. OX 1 3QD, February 1996.Google Scholar
  19. [19]
    Jonathan M. D. Hill, Bill McColl, Dan C. Stefanescu, Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, and Rob H. Bisseling. BSPlib: The BSP programming library. Parallel Computing, 24(14):1947–1980, December 1998.CrossRefGoogle Scholar
  20. [20]
    C.A.R. Hoare and J. He. Unified Theories of Programming. Prentice-Hall International, 1998.Google Scholar
  21. [21]
    P. Hudak. The conception, evaluation, and application of functional programming. ACM Computing Surveys, 21(3):359–411, 1989.CrossRefGoogle Scholar
  22. [22]
    P. Kelly. Functional Programming for Loosely-Coupled Multiprocessors. Pitman, 1989.zbMATHGoogle Scholar
  23. [23]
    D. Lecomber. Methods of BSP Programming. PhD thesis, Oxford University Computing Laboratory, 1998.Google Scholar
  24. [24]
    W. F. McColl. General purpose parallel computing. In A. M. Gibbons and P. Spirakis, editors, Lectures on Parallel Computation, Cambridge International Series on Parallel Computation, pages 337–391. Cambridge University Press, 1993.Google Scholar
  25. [25]
    P.B. Monk, A.K. Parrott, and P.J. Wesson. A parallel finite element method for electromagnetic scattering. COMPEL, 13, Supp.A:237–242, 1994.Google Scholar
  26. [26]
    M. Nibhanupudi, C. Norton, and B. Szymanski. Plasma simulation on networks of workstations using the bulk synchronous parallel model. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Athens, GA, November 1995.Google Scholar
  27. [27]
    S.L. Peyton-Jones and David Lester. Implementing Functional Programming Languages. Prentice-Hall International Series in Computer Science, 1992.Google Scholar
  28. [28]
    Loic Prylli and Bernard Tourancheau. A new protocol designed for high performance networking on Myrinet. In Parallel and Distributed Processing, volume 1388 of Lecture Notes in Computer Science, pages 472–485. Springer, 1998.CrossRefGoogle Scholar
  29. [29]
    R.O. Rogers and D.B. Skillicorn. Using the BSP cost model for optimal parallel neural network training. Future Generation Computer Systems, 14:409–424, 1998.CrossRefGoogle Scholar
  30. [30]
    R.O. Rogers and D.B. Skillicorn. Using the BSP cost model to optimize parallel neural network training. Future Generation Computer Systems, 14:409–424, 1998.CrossRefGoogle Scholar
  31. [31]
    Constantinos Siniolakis. Bulk-synchronous parallel algorithms in computational geometry. Technical Report PRG-TR-10–96, Oxford University, Computing Laboratory, May 1996.Google Scholar
  32. [32]
    D. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26–35, October-December 1999.CrossRefGoogle Scholar
  33. [33]
    D.B. Skillicorn. Architecture-independent parallel computation. IEEE Computer, 23(12):38–51, December 1990.CrossRefGoogle Scholar
  34. [34]
    D.B. Skillicorn. Structuring data parallelism using categorical data types. In Programming Models for Massively Parallel Computers, pages 110–115, Berlin, September 1993. Computer Society Press.CrossRefGoogle Scholar
  35. [35]
    D.B. Skillicorn. Foundations of Parallel Programming. Number 6 in Cambridge Series in Parallel Computation. Cambridge University Press, 1994.CrossRefGoogle Scholar
  36. [36]
    D.B. Skillicorn. Building BSP programs using the Refinement Calculus. In Third International Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA’98), Springer Lecture Notes in Computer Science 1388, pages 790–795, March/April 1998.Google Scholar
  37. [37]
    D.B. Skillicorn. Parallel predictor generation. In Proceedings of a Workshop on Large-Scale Parallel KDD Systems, KDD’99, number 1759 in Lecture Notes in Artificial Intelligence, pages 190–196. Springer-Verlag, 2000.Google Scholar
  38. [38]
    D.B. Skillicorn, J.M.D. Hill, and W.F. McColl. Questions and answers about BSP. Scientific Programming, 6(3):249–274, 1997.Google Scholar
  39. [39]
    D.B. Skillicorn and D. Talia. Models and programming languages for parallel computation. Computing Surveys, 30(2):123–169, June 1998.CrossRefGoogle Scholar
  40. [40]
    L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, August 1990.CrossRefGoogle Scholar
  41. [41]
    Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active Messages: A mechanism for integrated communication and computation. In The 19th Annual International Symposium on Computer Architecture, volume 20(2) of ACM SIGARCH Computer Architecture News. ACM Press, May 1992.Google Scholar
  42. [42]
    Y. Wang and D.B. Skillicorn. Parallel inductive logic for data mining. In Workshop on Distributed and Parallel Knowledge Discovery, KDD 2000, Boston, to appear. ACM Press.Google Scholar
  43. [43]
    M. Zaki. Parallel and distributed data mining: A survey. IEEE Concurrency, 7(4):14–25, October—December 1999.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2002

Authors and Affiliations

  • D. B. Skillicorn
    • 1
  1. 1.Department of Computing and Information ScienceQueen’s UniversityKingstonCanada

Personalised recommendations