An Optimal Broadcast Algorithm Adapted to SMP Clusters
We describe and and evaluate the adaption of a new, optimal broadcast algorithm for “flat”, fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm improves over other commonly used broadcast algorithms (pipelined binary trees, recursive halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks can take place concurrently with inter-node communication over the network. This new algorithm has been incorporated into a state-of-the art MPI library. On a 32-node dual-processor AMD cluster with Myrinet interconnect, improvements of a factor of 1.5 over for instance a pipelined binary tree algorithm has been achieved, both for the case with one and with two MPI processes per node.
KeywordsMessage Passing Interface Collective Operation Broadcast Algorithm Binomial Tree Communication Round
Unable to display preview. Download preview PDF.
- 2.Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: Cluster 2004 (2004)Google Scholar
- 3.Gołebiewski, M., Ritzdorf, H., Träff, J.L., Zimmermann, F.: The MPI/SX implementation of MPI for NEC’s SX-6 and other NEC platforms. NEC Research & Development 44(1), 69–74 (2003)Google Scholar
- 8.Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)Google Scholar