Characterizing Performance of Imbalanced Collectives on Hybrid and Task Centric Runtimes for Two-Phase Reduction

Wickramasinghe, Udayanga; Lumsdaine, Andrew

doi:10.1007/978-3-030-35225-7_10

Udayanga Wickramasinghe⁹ &
Andrew Lumsdaine¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11403))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

358 Accesses

Abstract

As clusters of multicore nodes become the standard platform for HPC, programmers are adopting approaches that combine multicore programming (e.g. OpenMP) for on-node parallelism with MPI for inter-node parallelism—the so-called “MPI+X”. In important use cases, such as reductions, this hybrid approach can necessitate a scalability-limiting sequence of independent parallel operations, one for each paradigm. For example, MPI+OpenMP typically performs a global parallel reduction by first performing a local OpenMP reduction followed by an MPI reduction across the nodes. If the local reductions are not well balanced, which can happen in the case of irregular or dynamic adaptive applications, the scalability of the overall reduction operation becomes limited. In this paper, we study the impact of imbalanced reductions on two different execution models: MPI+X and Asynchronous Many Tasking (AMT), with MPI+OpenMP and HPX-5 as concrete instances of these respective models. We explore several approaches to maximizing asynchrony with the HPX-5 and MPI+OpenMP collective programming interfaces and characterize the imbalance using a specialized set of microbenchmarks. Despite maximizing MPI+OpenMP asynchrony, we find situations where scalability of the MPI+X programming model is significantly impaired for two-phase reductions. We report from 0.5X to 6.5X relative performance degradation of MPI+X in the AMT instance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Quote attributed to Bill Gropp.
2.
This can be superseded by MPI-4 Endpoints [16] if the proposal is accepted.
3.
We model sequential work as a compute segment with too many data dependencies such that any parallelization of respective code regions is either impossible or impractical.
4.
For example MPI would need to execute in MPI_THREAD_MULTIPLE mode with OpenMP which may induce certain penalties compared to regular mode.
5.
Amdhal’s Law can be applied for all other cases when both sequential and parallel code regions are present in \(W_o\). However, this evaluation goes beyond the scope of this paper.
6.
Optimal solution found when \(T=t_{max}+t_{comm}\).
7.
In fact, collectives in HPX-5 are data driven and not execution driven. The identity of the joining threads is inconsequential, and the completion of a collective operation triggers a set of registered continuations.
8.
A collective (i.e. tree-based) algorithm was consistent across all experiments and runtime modes.
9.
Each parallel load injection \(t_{i}\) was scaled between \(t_{u}\) and \(3.t_{u}\).

References

Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., Nataraj, A.: Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Comput. 11(1), 3–16 (2008). https://doi.org/10.1007/s10586-007-0047-2
Article Google Scholar
Ferreira, K.B., Bridges, P., Brightwell, R.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of SC 2008, pp. 19:1–19:12. IEEE Press, Piscataway (2008). http://dl.acm.org/citation.cfm?id=1413370.1413390
Hoefler, T., Schneider, T., Lumsdaine, A.: The impact of network noise at large-scale communication performance. In: IPDPS 2009, pp. 1–8 (2009). https://doi.org/10.1109/IPDPS.2009.5161095
Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: Proceedings of ICPPW 2009, pp. 394–401. IEEE Computer Society, Washington, DC (2009). https://doi.org/10.1109/ICPPW.2009.14
Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of SC 2010, pp. 1–11. IEEE Computer Society, Washington, DC (2010). https://doi.org/10.1109/SC.2010.12
Agarwal, S., Garg, R., Vishnoi, N.K.: The impact of noise on the scaling of collectives: a theoretical approach. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 280–289. Springer, Heidelberg (2005). https://doi.org/10.1007/11602569_31
Chapter Google Scholar
CREST: HPX-5. http://hpx.crest.iu.edu
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Article MathSciNet Google Scholar
Kissel, E., Swany, M.: Photon: remote memory access middleware for high-performance runtime systems. In: IPDPSW 2016, pp. 1736–1743 (2016). https://doi.org/10.1109/IPDPSW.2016.120
Wickramasinghe, U., DAlessandro, L., Lumsdaine, A., Kissel, E., Swany, M., Newton, R.: Evaluating collectives in networks of multicore/two-level reduction. Technical report, Indiana University, School of Informatics and Computing (2017)
Google Scholar
Bova, S., et al.: Combining message-passing and directives in parallel applications. SIAM News 32(9), 10–14 (1999)
Google Scholar
Cappello, F., Etiemble, D.: MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks. In: Supercomputing, ACM/IEEE 2000 Conference, p. 12 (2000). https://doi.org/10.1109/SC.2000.10001
Corbalan, J., Duran, A., Labarta, J.: Dynamic load balancing of MPI+OpenMP applications. In: ICPP 2004, vol. 1, pp. 195–202 (2004). https://doi.org/10.1109/ICPP.2004.1327921
Huang, W., Tafti., D.: A parallel computing framework for dynamic power balancing in adaptive mesh refinement applications. In: Proceedings of Parallel Computational Fluid Dynamics, pp. 249–256 (1999)
Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
Google Scholar
Dinan, J., et al.: Enabling communication concurrency through flexible MPI endpoints. Int. J. High Perform. Comput. Appl. 28(4), 390–405 (2014)
Article Google Scholar
Dokulil, J., Sandrieser, M., Benkner, S.: OCR-Vx-an alternative implementation of the open community runtime. In: International Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures, in Conjunction with SC15, Austin, Texas (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Indiana University, Bloomington, IN, USA
Udayanga Wickramasinghe
Pacific Northwest National Laboratory, Richland, USA
Andrew Lumsdaine

Authors

Udayanga Wickramasinghe
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Lumsdaine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Udayanga Wickramasinghe .

Editor information

Editors and Affiliations

Texas A&M University, College Station, TX, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wickramasinghe, U., Lumsdaine, A. (2019). Characterizing Performance of Imbalanced Collectives on Hybrid and Task Centric Runtimes for Two-Phase Reduction. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2017. Lecture Notes in Computer Science(), vol 11403. Springer, Cham. https://doi.org/10.1007/978-3-030-35225-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-35225-7_10
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35224-0
Online ISBN: 978-3-030-35225-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics