Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling

Wasi-ur- Rahman, Md.; Ozog, David; Dinan, James

doi:10.1007/978-3-030-50743-5_25

Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling

Md. Wasi-ur- Rahman¹²,
David Ozog¹³ &
James Dinan¹³

Conference paper
First Online: 15 June 2020

2238 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12151))

The original version of this chapter was revised: The chapter was reverted to a regular, non-Open Access chapter. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-50743-5_28

Abstract

Overlap of communication with computation is a key optimization for high performance computing (HPC) applications. In this paper, we explore the usage of user-level threading to enable productive and efficient communication overlap and pipelining. We extend OpenSHMEM with integrated user-level thread scheduling, enabling applications to leverage fine-grain threading as an alternative to non-blocking communication. Our solution introduces communication-aware thread scheduling that utilizes the communication state of threads to minimize context switching overheads. We identify several patterns common to multi-threaded OpenSHMEM applications, leverage user-level threads to increase overlap of communication and computation, and explore the impact of different thread scheduling policies. Results indicate that user-level threading can enable blocking communication to meet the performance of highly-optimized, non-blocking, single-threaded codes with significantly lower application-level complexity. In one case, we observe a 28.7% performance improvement for the Smith-Waterman DNA sequence alignment benchmark.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Change history

15 June 2020
The original version of chapters 17 and 24 were previously published non-open access. They have now been made open access under a CC BY 4.0 license and the copyright holder has been changed to ‘The Author(s).’ The book has also been updated with the change.
The chapters 19 and 25 were inadvertently published open access. This has been corrected and the chapters are now non-open access.

References

Complete Context Control. https://www.gnu.org/software/libc/manual/html_node/System-V-contexts.htm
Mandelbrot in Sandia OpenSHMEM. https://github.com/Sandia-OpenSHMEM/SOS/blob/master/test/apps/mandelbrot.c
Official Argobots Repository. https://github.com/pmodels/argobots
OPA-PSM2. https://github.com/intel/opa-psm2
Smith-Waterman algorithm in SSCA1. https://github.com/ornl-languages/osb/tree/master/ssca1
The OpenMP API Specification. https://www.openmp.org/
Baker, M., Welch, A., Gorentla Venkata, M.: Parallelizing the smith-waterman algorithm using OpenSHMEM and MPI-3 one-sided interfaces. In: OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, pp. 178–191 (2015)
Google Scholar
Bendersky, E.: Measuring context switching and memory overheads for Linux threads. https://github.com/eliben/code-for-blog/tree/master/2018/threadoverhead
Bolt is openmp over light-weight threads. https://www.bolt-omp.org/
Boost c++ libraries. https://www.boost.org
Castelló, A., Peña, A.J., Seo, S., Mayo, R., Balaji, P., Quintana-Ort, E.S.: A review of lightweight thread approaches for high performance computing. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 471–480, September 2016
Google Scholar
Castillo, E., et al.: Optimizing computation-communication overlap in asynchronous task-based programs. In: Eigenmann, R., Ding, C., McKee, S.A. (eds.) Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, 26–28 June 2019, pp. 380–391 (2019)
Google Scholar
Castillo, E., t al.: Optimizing computation-communication overlap in asynchronous task-based programs. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, p. 415–416. PPoPP 19, New York, NY, USA (2019)
Google Scholar
Dinan, J., Flajslik, M.: Contexts: a mechanism for high throughput communication in OpenSHMEM. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, pp. 10:1–10:9. New York, NY, USA (2014)
Google Scholar
Grossman, M., Doyle, J., Dinan, J., Pritchard, H., Seager, K., Sarkar, V.: Implementation and evaluation of OpenSHMEM contexts using OFI libfabric. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds.) OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, pp. 19–34. Cham (2018)
Google Scholar
Grun, P., et al.: A brief introduction to the OpenFabrics interfaces - a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 34–39, August 2015
Google Scholar
Hanebutte, U., Hemstad, J.: ISx: a scalable integer sort for co-design in the exascale era. In: 9th International Conference on Partitioned Global Address Space Programming Models, pp. 102–104, September 2015
Google Scholar
Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L.(ed.) Languages and Compilers for Parallel Computing, pp. 306–322 (2004)
Google Scholar
Kamal, H., Wagner, A.: FG-MPI: fine-grain MPI for multicore and clusters. In: IEEE International Symposium on Parallel Distributed Processing, Workshops and Ph.d. Forum (IPDPSW), pp. 1–8, April 2010
Google Scholar
Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: IEEE 17th International Conference on High Performance Computing and Communications, pp. 444–454, August 2015
Google Scholar
Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSS approach. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 5–16. ICS 2010, NY, USA (2010)
Google Scholar
MPI Forum: MPI: a message-passing interface standard version 3.1. Technical report, University of Tennessee, Knoxville, June 2015
Google Scholar
Nakashima, J., Taura, K.: MassiveThreads: A Thread Library for High Productivity Languages, pp. 222–238. Heidelberg (2014)
Google Scholar
OpenSHMEM application programming interface, version 1.4. http://www.openshmem.org, December 2017
Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008 - Parallel Processing. pp. 78–88. Heidelberg (2008)
Google Scholar
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE International Conference on Cluster Computing, pp. 142–151, September 2008
Google Scholar
Rahman, M.W.U., Ozog, D., Dinan, J.: Lightweight instrumentation and analysis using OpenSHMEM performance counters. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity. pp. 180–201 (2019)
Google Scholar
Reinders, J.: Intel Threading Building Blocks. First edn, Sebastopol, CA, USA (2007)
Google Scholar
Sala, K., et al.: Improving the interoperability between MPI and task-based programming models. In: Proceedings of the 25th European MPI Users Group Meeting. EuroMPI18, New York, NY, USA (2018)
Google Scholar
Sandia OpenSHMEM (2018). https://github.com/Sandia-OpenSHMEM/SOS
Seager, K., Choi, S.E., Dinan, J., Pritchard, H., Sur, S.: Design and Implementation of OpenSHMEM Using OFI on the Aries Interconnect. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, pp. 97–113. Cham (2016)
Google Scholar
Seo, S., et al.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. 29(3), 512–526 (2018)
Article Google Scholar
Tang, C., Bouteiller, A., Herault, T., Gorentla Venkata, M., Bosilca, G.: From MPI to OpenSHMEM: porting LAMMPS. In: OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, pp. 121–137 (2015)
Google Scholar
Taura, K., Tabata, K., Yonezawa, A.: Stackthreads/mp: Integrating futures into calling standards. In: ACM SIGPLAN Symposium Principles Practice Parallel Program (1999)
Google Scholar
Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: an API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8, April 2008
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Austin, TX, USA
Md. Wasi-ur- Rahman
Intel Corporation, Hudson, MA, USA
David Ozog & James Dinan

Authors

Md. Wasi-ur- Rahman
View author publications
You can also search for this author in PubMed Google Scholar
David Ozog
View author publications
You can also search for this author in PubMed Google Scholar
James Dinan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Wasi-ur- Rahman .

Editor information

Editors and Affiliations

School of Computing, University of Utah, Salt Lake City, UT, USA
Ponnuswamy Sadayappan
Cray, a Hewlett Packard Enterprise Company, Seattle, WA, USA
Bradford L. Chamberlain
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany
Guido Juckeland
Extreme Computing Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Hatem Ltaief

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wasi-ur- Rahman, M., Ozog, D., Dinan, J. (2020). Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12151. Springer, Cham. https://doi.org/10.1007/978-3-030-50743-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-50743-5_25
Published: 15 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50742-8
Online ISBN: 978-3-030-50743-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics