Abstract
Overlap of communication with computation is a key optimization for high performance computing (HPC) applications. In this paper, we explore the usage of user-level threading to enable productive and efficient communication overlap and pipelining. We extend OpenSHMEM with integrated user-level thread scheduling, enabling applications to leverage fine-grain threading as an alternative to non-blocking communication. Our solution introduces communication-aware thread scheduling that utilizes the communication state of threads to minimize context switching overheads. We identify several patterns common to multi-threaded OpenSHMEM applications, leverage user-level threads to increase overlap of communication and computation, and explore the impact of different thread scheduling policies. Results indicate that user-level threading can enable blocking communication to meet the performance of highly-optimized, non-blocking, single-threaded codes with significantly lower application-level complexity. In one case, we observe a 28.7% performance improvement for the Smith-Waterman DNA sequence alignment benchmark.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
15 June 2020
The original version of chapters 17 and 24 were previously published non-open access. They have now been made open access under a CC BY 4.0 license and the copyright holder has been changed to ‘The Author(s).’ The book has also been updated with the change.
The chapters 19 and 25 were inadvertently published open access. This has been corrected and the chapters are now non-open access.
References
Complete Context Control. https://www.gnu.org/software/libc/manual/html_node/System-V-contexts.htm
Mandelbrot in Sandia OpenSHMEM. https://github.com/Sandia-OpenSHMEM/SOS/blob/master/test/apps/mandelbrot.c
Official Argobots Repository. https://github.com/pmodels/argobots
OPA-PSM2. https://github.com/intel/opa-psm2
Smith-Waterman algorithm in SSCA1. https://github.com/ornl-languages/osb/tree/master/ssca1
The OpenMP API Specification. https://www.openmp.org/
Baker, M., Welch, A., Gorentla Venkata, M.: Parallelizing the smith-waterman algorithm using OpenSHMEM and MPI-3 one-sided interfaces. In: OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, pp. 178–191 (2015)
Bendersky, E.: Measuring context switching and memory overheads for Linux threads. https://github.com/eliben/code-for-blog/tree/master/2018/threadoverhead
Bolt is openmp over light-weight threads. https://www.bolt-omp.org/
Boost c++ libraries. https://www.boost.org
Castelló, A., Peña, A.J., Seo, S., Mayo, R., Balaji, P., Quintana-Ort, E.S.: A review of lightweight thread approaches for high performance computing. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 471–480, September 2016
Castillo, E., et al.: Optimizing computation-communication overlap in asynchronous task-based programs. In: Eigenmann, R., Ding, C., McKee, S.A. (eds.) Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, 26–28 June 2019, pp. 380–391 (2019)
Castillo, E., t al.: Optimizing computation-communication overlap in asynchronous task-based programs. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, p. 415–416. PPoPP 19, New York, NY, USA (2019)
Dinan, J., Flajslik, M.: Contexts: a mechanism for high throughput communication in OpenSHMEM. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, pp. 10:1–10:9. New York, NY, USA (2014)
Grossman, M., Doyle, J., Dinan, J., Pritchard, H., Seager, K., Sarkar, V.: Implementation and evaluation of OpenSHMEM contexts using OFI libfabric. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds.) OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, pp. 19–34. Cham (2018)
Grun, P., et al.: A brief introduction to the OpenFabrics interfaces - a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 34–39, August 2015
Hanebutte, U., Hemstad, J.: ISx: a scalable integer sort for co-design in the exascale era. In: 9th International Conference on Partitioned Global Address Space Programming Models, pp. 102–104, September 2015
Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L.(ed.) Languages and Compilers for Parallel Computing, pp. 306–322 (2004)
Kamal, H., Wagner, A.: FG-MPI: fine-grain MPI for multicore and clusters. In: IEEE International Symposium on Parallel Distributed Processing, Workshops and Ph.d. Forum (IPDPSW), pp. 1–8, April 2010
Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: IEEE 17th International Conference on High Performance Computing and Communications, pp. 444–454, August 2015
Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSS approach. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 5–16. ICS 2010, NY, USA (2010)
MPI Forum: MPI: a message-passing interface standard version 3.1. Technical report, University of Tennessee, Knoxville, June 2015
Nakashima, J., Taura, K.: MassiveThreads: A Thread Library for High Productivity Languages, pp. 222–238. Heidelberg (2014)
OpenSHMEM application programming interface, version 1.4. http://www.openshmem.org, December 2017
Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008 - Parallel Processing. pp. 78–88. Heidelberg (2008)
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE International Conference on Cluster Computing, pp. 142–151, September 2008
Rahman, M.W.U., Ozog, D., Dinan, J.: Lightweight instrumentation and analysis using OpenSHMEM performance counters. In: OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity. pp. 180–201 (2019)
Reinders, J.: Intel Threading Building Blocks. First edn, Sebastopol, CA, USA (2007)
Sala, K., et al.: Improving the interoperability between MPI and task-based programming models. In: Proceedings of the 25th European MPI Users Group Meeting. EuroMPI18, New York, NY, USA (2018)
Sandia OpenSHMEM (2018). https://github.com/Sandia-OpenSHMEM/SOS
Seager, K., Choi, S.E., Dinan, J., Pritchard, H., Sur, S.: Design and Implementation of OpenSHMEM Using OFI on the Aries Interconnect. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, pp. 97–113. Cham (2016)
Seo, S., et al.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. 29(3), 512–526 (2018)
Tang, C., Bouteiller, A., Herault, T., Gorentla Venkata, M., Bosilca, G.: From MPI to OpenSHMEM: porting LAMMPS. In: OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, pp. 121–137 (2015)
Taura, K., Tabata, K., Yonezawa, A.: Stackthreads/mp: Integrating futures into calling standards. In: ACM SIGPLAN Symposium Principles Practice Parallel Program (1999)
Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: an API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8, April 2008
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wasi-ur- Rahman, M., Ozog, D., Dinan, J. (2020). Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12151. Springer, Cham. https://doi.org/10.1007/978-3-030-50743-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-50743-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50742-8
Online ISBN: 978-3-030-50743-5
eBook Packages: Computer ScienceComputer Science (R0)