Abstract
The collective reduce-scatter operation in MPI performs an element-wise reduction using a given associative (and possibly commutative) binary operation of a sequence of m-element vectors, and distributes the result in m i sized blocks over the participating processors. For the case where the number of processors is a power of two, the binary operation is commutative, and all resulting blocks have the same size, efficient, butterfly-like algorithms are well-known and implemented in good MPI libraries.
The contributions of this paper are threefold. First, we give a simple trick for extending the butterfly algorithm also to the case of non-commutative operations (which is advantageous also for the commutative case). Second, combining this with previous work, we give improved algorithms for the case where the number of processors is not a power of two. Third, we extend the algorithms also to the irregular case where the size of the resulting blocks may differ extremely.
For p processors the algorithm requires ⌈log 2 p ⌉ + (⌈log 2 p ⌉ - \(\lfloor log_2p \rfloor\)) communication rounds for the regular case, which may double for the irregular case (depending on the amount of irregularity). For vectors of size m with \(m = \sum^{p-1}_{i=0}m_i\) the total running time is O(log p + m), irrespective of whether the m i blocks are equal or not. The algorithm has been implemented, and on a small Myrinet cluster gives substantial improvements (up to a factor of 3 in the experiments reported) over other often used implementations. The reduce-scatter operation is a building block in the fence one-sided communication synchronization primitive, and for this application we also document worthwhile improvements over a previous implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bernaschi, M., Iannello, G., Lauria, M.: Efficient implementation of reduce-scatter in MPI. Technical report, University of Napoli (1997)
Gołebiewski, M., Ritzdorf, H., Träff, J.L., Zimmermann, F.: The MPI/SX implementation of MPI for NEC’s SX-6 and other NEC platforms. NEC Research & Development 44(1), 69–74 (2003)
Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W., Snir, M.: MPI – The Complete Reference, 2nd edn. The MPI Extensions. MIT Press, Cambridge (1998)
Gropp, W.D., Ross, R., Miller, N.: Providing efficient I/O redundancy in MPI environments. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 77–86. Springer, Heidelberg (2004)
Iannello, G.: Efficient algorithms for the reduce-scatter operation in LogGP. IEEE Transactions on Parallel and Distributed Systems 8(9), 970–982 (1997)
Leighton, F.T.: Introduction to Parallel Algorithms and Architechtures: Arrays, Trees, Hypercubes. Morgan Kaufmann Publishers, San Francisco (1992)
Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)
Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)
Thakur, R., Gropp, W.D., Toonen, B.: Minimizing synchronization overhead in the implementation of MPI one-sided communication. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 57–67. Springer, Heidelberg (2004)
Träff, J.L.: Hierarchical gather/scatter algorithms with graceful degradation. In: International Parallel and Distributed Processing Symposium, IPDPS 2004 (2004)
Träff, J.L., Ritzdorf, H., Hempel, R.: The implementation of MPI-2 one-sided communication for the NEC SX-5. In: Supercomputing (2000), http://www.sc2000.org/proceedings/techpapr/index.htm#01
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Träff, J.L. (2005). An Improved Algorithm for (Non-commutative) Reduce-Scatter with an Application. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2005. Lecture Notes in Computer Science, vol 3666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557265_20
Download citation
DOI: https://doi.org/10.1007/11557265_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29009-4
Online ISBN: 978-3-540-31943-6
eBook Packages: Computer ScienceComputer Science (R0)