An Optimal Broadcast Algorithm Adapted to SMP Clusters

  • Jesper Larsson Träff
  • Andreas Ripke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3666)


We describe and and evaluate the adaption of a new, optimal broadcast algorithm for “flat”, fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm improves over other commonly used broadcast algorithms (pipelined binary trees, recursive halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks can take place concurrently with inter-node communication over the network. This new algorithm has been incorporated into a state-of-the art MPI library. On a 32-node dual-processor AMD cluster with Myrinet interconnect, improvements of a factor of 1.5 over for instance a pipelined binary tree algorithm has been achieved, both for the case with one and with two MPI processes per node.


Message Passing Interface Collective Operation Broadcast Algorithm Binomial Tree Communication Round 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)CrossRefGoogle Scholar
  2. 2.
    Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: Cluster 2004 (2004)Google Scholar
  3. 3.
    Gołebiewski, M., Ritzdorf, H., Träff, J.L., Zimmermann, F.: The MPI/SX implementation of MPI for NEC’s SX-6 and other NEC platforms. NEC Research & Development 44(1), 69–74 (2003)Google Scholar
  4. 4.
    Johnsson, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Juhász, S., Kovács, F.: Asynchronous distributed broadcasting in cluster environment. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 164–172. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Santos, E.E.: Optimal and near-optimal algorithms for k-item broadcast. Journal of Parallel and Distributed Computing 57(2), 121–139 (1999)zbMATHCrossRefGoogle Scholar
  8. 8.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)Google Scholar
  9. 9.
    Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)CrossRefGoogle Scholar
  10. 10.
    Träff, J.L.: A simple work-optimal broadcast algorithm for message passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Träff, J.L., Ripke, A.: Optimal broadcast for fully connected networks. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 45–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jesper Larsson Träff
    • 1
  • Andreas Ripke
    • 1
  1. 1.C&C Research Laboratories, NEC Europe Ltd.Sankt AugustinGermany

Personalised recommendations