An Improved Algorithm for (Non-commutative) Reduce-Scatter with an Application

Träff, Jesper Larsson

doi:10.1007/11557265_20

Jesper Larsson Träff¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3666))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

848 Accesses
5 Citations

Abstract

The collective reduce-scatter operation in MPI performs an element-wise reduction using a given associative (and possibly commutative) binary operation of a sequence of m-element vectors, and distributes the result in m _i sized blocks over the participating processors. For the case where the number of processors is a power of two, the binary operation is commutative, and all resulting blocks have the same size, efficient, butterfly-like algorithms are well-known and implemented in good MPI libraries.

The contributions of this paper are threefold. First, we give a simple trick for extending the butterfly algorithm also to the case of non-commutative operations (which is advantageous also for the commutative case). Second, combining this with previous work, we give improved algorithms for the case where the number of processors is not a power of two. Third, we extend the algorithms also to the irregular case where the size of the resulting blocks may differ extremely.

For p processors the algorithm requires ⌈log ₂ p ⌉ + (⌈log ₂ p ⌉ - \(\lfloor log_2p \rfloor\)) communication rounds for the regular case, which may double for the irregular case (depending on the amount of irregularity). For vectors of size m with \(m = \sum^{p-1}_{i=0}m_i\) the total running time is O(log p + m), irrespective of whether the m _i blocks are equal or not. The algorithm has been implemented, and on a small Myrinet cluster gives substantial improvements (up to a factor of 3 in the experiments reported) over other often used implementations. The reduce-scatter operation is a building block in the fence one-sided communication synchronization primitive, and for this application we also document worthwhile improvements over a previous implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernaschi, M., Iannello, G., Lauria, M.: Efficient implementation of reduce-scatter in MPI. Technical report, University of Napoli (1997)
Google Scholar
Gołebiewski, M., Ritzdorf, H., Träff, J.L., Zimmermann, F.: The MPI/SX implementation of MPI for NEC’s SX-6 and other NEC platforms. NEC Research & Development 44(1), 69–74 (2003)
Google Scholar
Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W., Snir, M.: MPI – The Complete Reference, 2nd edn. The MPI Extensions. MIT Press, Cambridge (1998)
Google Scholar
Gropp, W.D., Ross, R., Miller, N.: Providing efficient I/O redundancy in MPI environments. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 77–86. Springer, Heidelberg (2004)
Chapter Google Scholar
Iannello, G.: Efficient algorithms for the reduce-scatter operation in LogGP. IEEE Transactions on Parallel and Distributed Systems 8(9), 970–982 (1997)
Article Google Scholar
Leighton, F.T.: Introduction to Parallel Algorithms and Architechtures: Arrays, Trees, Hypercubes. Morgan Kaufmann Publishers, San Francisco (1992)
MATH Google Scholar
Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)
Chapter Google Scholar
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)
Google Scholar
Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)
Article Google Scholar
Thakur, R., Gropp, W.D., Toonen, B.: Minimizing synchronization overhead in the implementation of MPI one-sided communication. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 57–67. Springer, Heidelberg (2004)
Chapter Google Scholar
Träff, J.L.: Hierarchical gather/scatter algorithms with graceful degradation. In: International Parallel and Distributed Processing Symposium, IPDPS 2004 (2004)
Google Scholar
Träff, J.L., Ritzdorf, H., Hempel, R.: The implementation of MPI-2 one-sided communication for the NEC SX-5. In: Supercomputing (2000), http://www.sc2000.org/proceedings/techpapr/index.htm#01

Download references

Author information

Authors and Affiliations

C&C Research Laboratories, NEC Europe Ltd, Rathausallee 10, D-53757, Sankt Augustin, Germany
Jesper Larsson Träff

Authors

Jesper Larsson Träff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’ Informazione, Second University of Naples - Italy, Real Casa dell’Annunziata - via Roma, 29, 81031, Aversa, CE, Italy
Beniamino Di Martino
GUP, Institute of Graphics and Parallel Processing, Johannes Kepler University, Altenbergerstraße 69, A-4040, Linz, Austria
Dieter Kranzlmüller
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Träff, J.L. (2005). An Improved Algorithm for (Non-commutative) Reduce-Scatter with an Application. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2005. Lecture Notes in Computer Science, vol 3666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557265_20

Download citation

DOI: https://doi.org/10.1007/11557265_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29009-4
Online ISBN: 978-3-540-31943-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics