The Cooperative Parallel: A Discussion About Run-Time Schedulers for Nested Parallelism

Royuela, Sara; Serrano, Maria A.; Garcia-Gasulla, Marta; Mateo Bellido, Sergi; Labarta, Jesús; Quiñones, Eduardo

doi:10.1007/978-3-030-28596-8_12

Sara Royuela¹²,
Maria A. Serrano¹²,
Marta Garcia-Gasulla¹²,
Sergi Mateo Bellido¹²,
Jesús Labarta¹² &
…
Eduardo Quiñones¹²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11718))

Included in the following conference series:

International Workshop on OpenMP

848 Accesses
2 Citations
1 Altmetric

Abstract

Nested parallelism is a well-known parallelization strategy to exploit irregular parallelism in HPC applications. This strategy also fits in critical real-time embedded systems, composed of a set of concurrent functionalities. In this case, nested parallelism can be used to further exploit the parallelism of each functionality. However, current run-time implementations of nested parallelism can produce inefficiencies and load imbalance. Moreover, in critical real-time embedded systems, it may lead to incorrect executions due to, for instance, a work non-conserving scheduler. In both cases, the reason is that the teams of OpenMP threads are a black-box for the scheduler, i.e., the scheduler that assigns OpenMP threads and tasks to the set of available computing resources is agnostic to the internal execution of each team.

This paper proposes a new run-time scheduler that considers dynamic information of the OpenMP threads and tasks running within several concurrent teams, i.e., concurrent parallel regions. This information may include the existence of OpenMP threads waiting in a barrier and the priority of tasks ready to execute. By making the concurrent parallel regions to cooperate, the shared computing resources can be better controlled and a work conserving and priority driven scheduler can be guaranteed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The parallel region that encloses the two functionalities is not shown for simplicity.

References

ARB: Openmp 3.0 specification (2008). https://www.openmp.org/wp-content/uploads/spec30.pdf
ARB: Openmp 5.0 specification (2018). https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new OpenMP tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85261-2_5
Chapter Google Scholar
Barney, B.: Posix threads programming (2017). https://computing.llnl.gov/tutorials/pthreads/
Bertogna, M., Xhani, O., Marinoni, M., Esposito, F., Buttazzo, G.: Optimal selection of preemption points to minimize preemption overhead. In: Procedings of the 23rd Euromicro Conference on Real-Time Systems (ECRTS) (2011)
Google Scholar
Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Comput. 31(10–12), 984–998 (2005)
Article Google Scholar
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)
Article MathSciNet Google Scholar
Briggs, J.P., Pennycook, S.J., Fergusson, J.R., Jäykkä, J., Shellard, E.P.: Chapter 10 - cosmic microwave background analysis: nested parallelism in practice. In: High Performance Parallelism Pearls, vol. 2, pp. 171–190 (2015)
Chapter Google Scholar
Caballero, D., Duran, A., Martorell, X.: An OpenMP* barrier using SIMD instructions for Intel^® Xeon Phi^TM coprocessor. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 99–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_8
Chapter Google Scholar
Cajas, J., et al.: Fluid-structure interaction based on HPC multicode coupling. SIAM J. Sci. Comput. 40(6), C677–C703 (2018)
Article MathSciNet Google Scholar
Center, B.S.: Ompss user guide (2019). https://pm.bsc.es/ftp/ompss/doc/user-guide/index.html
Chrysos, G.: Intel^® Xeon Phi^™ Coprocessor - The architecture. Intel Whitepaper 176 (2014)
Google Scholar
Dimakopoulos, V.V., Hadjidoukas, P.E., Philos, G.C.: A microbenchmark study of OpenMP overheads under nested parallelism. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 1–12. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_1
Chapter Google Scholar
Duran, A., Gonzalez, M., Corbalán, J.: Automatic thread distribution for nested parallelism in OpenMP. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 121–130. ACM (2005)
Google Scholar
Ferry, D., Li, J., Mahadevan, M., Agrawal, K., Gill, C., Lu, C.: A real-time scheduling service for parallel tasks. In: 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 261–272. IEEE (2013)
Google Scholar
Garcia, M., Corbalan, J., Labarta, J.: LeWI: a runtime balancing algorithm for nested parallelism. In: International Conference on Parallel Processing, pp. 526–533 (2009)
Google Scholar
Garcia Gasulla, M.: Dynamic load balancing for hybrid applications (2017)
Google Scholar
Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B., Houzeaux, G.: Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int. J. High Perform. Comput. Appl. (2019)
Google Scholar
GNU: libgomp (2019). https://gcc.gnu.org/onlinedocs/libgomp/
Hun, L.C., Yeng, O.L., Sze, L.T., Chet, K.V.: Kalman filtering and its real-time applications. In: Real-Time Systems (2016)
Google Scholar
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights, Landing edn. Morgan Kaufmann, Burlington (2016)
Google Scholar
Kim, J., Kim, H., Lakshmanan, K., Rajkumar, R.R.: Parallel scheduling for cyber-physical systems: analysis and case study on a self-driving car. In: Proceedings of the ACM/IEEE 4th International Conference on Cyber-physical Systems, pp. 31–40. ACM (2013)
Google Scholar
Knafla, B., Leopold, C.: Parallelizing a real-time steering simulation for computer games with OpenMP. In: Parallel Computing: Architectures, Algorithms, and Applications, vol. 15, p. 219 (2008)
Google Scholar
Kroening, D., Poetzl, D., Schrammel, P., Wachter, B.: Sound static deadlock analysis for C/Pthreads. In: 31st International Conference on Automated Software Engineering, pp. 379–390. IEEE, September 2016
Google Scholar
Kurzak, J., Dongarra, J.: Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 147–156. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75755-9_18
Chapter Google Scholar
LaGrone, J., Aribuki, A., Chapman, B.: A set of microbenchmarks for measuring OpenMP task overheads. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1. Citeseer (2011)
Google Scholar
Lindberg, P.: Performance obstacles for threading: how do they affect OpenMP code. Intel Software Developer Zone (2009). https://software.intel.com/en-us/articles/performance-obstacles-for-threading-how-do-they-affect-openmp-code
LLVM: OpenMP\(^\ast \): Support for the OpenMP language (2019). https://openmp.llvm.org
Meadows, L., Pennycook, S.J., Duran, A., Wilmarth, T., Cownie, J.: Workstealing and nested parallelism in SMP systems. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 47–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45550-1_4
Chapter Google Scholar
Meadows, L., Kim, J.: Chapter 18 - exploiting multilevel parallelism in quantum simulations. In: High Performance Parallelism Pearls. Volume 2: Multicore and Many-Core Programming Approaches, pp. 335–354 (2015)
Google Scholar
Nanjegowda, R., Hernandez, O., Chapman, B., Jin, H.H.: Scalability evaluation of barrier algorithms for OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 42–52. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02303-3_4
Chapter Google Scholar
Russinovich, M.E., Solomon, D.A., Ionescu, A.: Windows Internals. Pearson Education, London (2012)
Google Scholar
Serrano, M.A., Melani, A., Bertogna, M., Quiñones, E.: Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE) (2016)
Google Scholar
Serrano, M.A., Melani, A., Kehr, S., Bertogna, M., Quiñones, E.: An analysis of lazy and eager limited preemption approaches under DAG-based global fixed priority scheduling. In: Proceedings of the 20th IEEE International Symposium on Real-Time Distributed Computing (ISORC) (2017)
Google Scholar
Serrano, M.A., Melani, A., Vargas, R., Marongiu, A., Bertogna, M., Quiñones, E.: Timing characterization of OpenMP4 tasking model. In: International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 157–166. IEEE (2015)
Google Scholar
Serrano, M.A., Royuela, S., Quiñones, E.: Towards an OpenMP specification for critical real-time systems. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 143–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_10
Chapter Google Scholar
Sun, J., Guan, N., Wang, Y., He, Q., Yi, W.: Scheduling and analysis of realtime OpenMP task systems with tied tasks. In: Proceedings of Real-Time Systems Symposium (2017)
Google Scholar
Vargas, R., Quiñones, E., Marongiu, A.: OpenMP and timing predictability: a possible union? In: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pp. 617–620 (2015)
Google Scholar
YarKhan, A., Kurzak, J., Luszczek, P., Dongarra, J.: Porting the PLASMA numerical library to the OpenMP standard. Int. J. Parallel Prog. 45(3), 612–633 (2017)
Article Google Scholar

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780622.

Author information

Authors and Affiliations

Barcelona Supercomputing Center, Barcelona, Spain
Sara Royuela, Maria A. Serrano, Marta Garcia-Gasulla, Sergi Mateo Bellido, Jesús Labarta & Eduardo Quiñones

Authors

Sara Royuela
View author publications
You can also search for this author in PubMed Google Scholar
Maria A. Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Marta Garcia-Gasulla
View author publications
You can also search for this author in PubMed Google Scholar
Sergi Mateo Bellido
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Labarta
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Quiñones
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sara Royuela , Maria A. Serrano , Marta Garcia-Gasulla , Sergi Mateo Bellido , Jesús Labarta or Eduardo Quiñones .

Editor information

Editors and Affiliations

University of Auckland, Auckland, New Zealand
Xing Fan
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
University of Auckland, Auckland, New Zealand
Oliver Sinnen
University of Auckland, Auckland, New Zealand
Nasser Giacaman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Royuela, S., Serrano, M.A., Garcia-Gasulla, M., Mateo Bellido, S., Labarta, J., Quiñones, E. (2019). The Cooperative Parallel: A Discussion About Run-Time Schedulers for Nested Parallelism. In: Fan, X., de Supinski, B., Sinnen, O., Giacaman, N. (eds) OpenMP: Conquering the Full Hardware Spectrum. IWOMP 2019. Lecture Notes in Computer Science(), vol 11718. Springer, Cham. https://doi.org/10.1007/978-3-030-28596-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-28596-8_12
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28595-1
Online ISBN: 978-3-030-28596-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics