Abstract
Nested parallelism is a well-known parallelization strategy to exploit irregular parallelism in HPC applications. This strategy also fits in critical real-time embedded systems, composed of a set of concurrent functionalities. In this case, nested parallelism can be used to further exploit the parallelism of each functionality. However, current run-time implementations of nested parallelism can produce inefficiencies and load imbalance. Moreover, in critical real-time embedded systems, it may lead to incorrect executions due to, for instance, a work non-conserving scheduler. In both cases, the reason is that the teams of OpenMP threads are a black-box for the scheduler, i.e., the scheduler that assigns OpenMP threads and tasks to the set of available computing resources is agnostic to the internal execution of each team.
This paper proposes a new run-time scheduler that considers dynamic information of the OpenMP threads and tasks running within several concurrent teams, i.e., concurrent parallel regions. This information may include the existence of OpenMP threads waiting in a barrier and the priority of tasks ready to execute. By making the concurrent parallel regions to cooperate, the shared computing resources can be better controlled and a work conserving and priority driven scheduler can be guaranteed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The parallel region that encloses the two functionalities is not shown for simplicity.
References
ARB: Openmp 3.0 specification (2008). https://www.openmp.org/wp-content/uploads/spec30.pdf
ARB: Openmp 5.0 specification (2018). https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new OpenMP tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85261-2_5
Barney, B.: Posix threads programming (2017). https://computing.llnl.gov/tutorials/pthreads/
Bertogna, M., Xhani, O., Marinoni, M., Esposito, F., Buttazzo, G.: Optimal selection of preemption points to minimize preemption overhead. In: Procedings of the 23rd Euromicro Conference on Real-Time Systems (ECRTS) (2011)
Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Comput. 31(10–12), 984–998 (2005)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)
Briggs, J.P., Pennycook, S.J., Fergusson, J.R., Jäykkä, J., Shellard, E.P.: Chapter 10 - cosmic microwave background analysis: nested parallelism in practice. In: High Performance Parallelism Pearls, vol. 2, pp. 171–190 (2015)
Caballero, D., Duran, A., Martorell, X.: An OpenMP* barrier using SIMD instructions for Intel® Xeon PhiTM coprocessor. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 99–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_8
Cajas, J., et al.: Fluid-structure interaction based on HPC multicode coupling. SIAM J. Sci. Comput. 40(6), C677–C703 (2018)
Center, B.S.: Ompss user guide (2019). https://pm.bsc.es/ftp/ompss/doc/user-guide/index.html
Chrysos, G.: Intel® Xeon Phi™ Coprocessor - The architecture. Intel Whitepaper 176 (2014)
Dimakopoulos, V.V., Hadjidoukas, P.E., Philos, G.C.: A microbenchmark study of OpenMP overheads under nested parallelism. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 1–12. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_1
Duran, A., Gonzalez, M., Corbalán, J.: Automatic thread distribution for nested parallelism in OpenMP. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 121–130. ACM (2005)
Ferry, D., Li, J., Mahadevan, M., Agrawal, K., Gill, C., Lu, C.: A real-time scheduling service for parallel tasks. In: 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 261–272. IEEE (2013)
Garcia, M., Corbalan, J., Labarta, J.: LeWI: a runtime balancing algorithm for nested parallelism. In: International Conference on Parallel Processing, pp. 526–533 (2009)
Garcia Gasulla, M.: Dynamic load balancing for hybrid applications (2017)
Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B., Houzeaux, G.: Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int. J. High Perform. Comput. Appl. (2019)
GNU: libgomp (2019). https://gcc.gnu.org/onlinedocs/libgomp/
Hun, L.C., Yeng, O.L., Sze, L.T., Chet, K.V.: Kalman filtering and its real-time applications. In: Real-Time Systems (2016)
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights, Landing edn. Morgan Kaufmann, Burlington (2016)
Kim, J., Kim, H., Lakshmanan, K., Rajkumar, R.R.: Parallel scheduling for cyber-physical systems: analysis and case study on a self-driving car. In: Proceedings of the ACM/IEEE 4th International Conference on Cyber-physical Systems, pp. 31–40. ACM (2013)
Knafla, B., Leopold, C.: Parallelizing a real-time steering simulation for computer games with OpenMP. In: Parallel Computing: Architectures, Algorithms, and Applications, vol. 15, p. 219 (2008)
Kroening, D., Poetzl, D., Schrammel, P., Wachter, B.: Sound static deadlock analysis for C/Pthreads. In: 31st International Conference on Automated Software Engineering, pp. 379–390. IEEE, September 2016
Kurzak, J., Dongarra, J.: Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 147–156. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75755-9_18
LaGrone, J., Aribuki, A., Chapman, B.: A set of microbenchmarks for measuring OpenMP task overheads. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1. Citeseer (2011)
Lindberg, P.: Performance obstacles for threading: how do they affect OpenMP code. Intel Software Developer Zone (2009). https://software.intel.com/en-us/articles/performance-obstacles-for-threading-how-do-they-affect-openmp-code
LLVM: OpenMP\(^\ast \): Support for the OpenMP language (2019). https://openmp.llvm.org
Meadows, L., Pennycook, S.J., Duran, A., Wilmarth, T., Cownie, J.: Workstealing and nested parallelism in SMP systems. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 47–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45550-1_4
Meadows, L., Kim, J.: Chapter 18 - exploiting multilevel parallelism in quantum simulations. In: High Performance Parallelism Pearls. Volume 2: Multicore and Many-Core Programming Approaches, pp. 335–354 (2015)
Nanjegowda, R., Hernandez, O., Chapman, B., Jin, H.H.: Scalability evaluation of barrier algorithms for OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 42–52. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02303-3_4
Russinovich, M.E., Solomon, D.A., Ionescu, A.: Windows Internals. Pearson Education, London (2012)
Serrano, M.A., Melani, A., Bertogna, M., Quiñones, E.: Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE) (2016)
Serrano, M.A., Melani, A., Kehr, S., Bertogna, M., Quiñones, E.: An analysis of lazy and eager limited preemption approaches under DAG-based global fixed priority scheduling. In: Proceedings of the 20th IEEE International Symposium on Real-Time Distributed Computing (ISORC) (2017)
Serrano, M.A., Melani, A., Vargas, R., Marongiu, A., Bertogna, M., Quiñones, E.: Timing characterization of OpenMP4 tasking model. In: International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 157–166. IEEE (2015)
Serrano, M.A., Royuela, S., Quiñones, E.: Towards an OpenMP specification for critical real-time systems. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 143–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_10
Sun, J., Guan, N., Wang, Y., He, Q., Yi, W.: Scheduling and analysis of realtime OpenMP task systems with tied tasks. In: Proceedings of Real-Time Systems Symposium (2017)
Vargas, R., Quiñones, E., Marongiu, A.: OpenMP and timing predictability: a possible union? In: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pp. 617–620 (2015)
YarKhan, A., Kurzak, J., Luszczek, P., Dongarra, J.: Porting the PLASMA numerical library to the OpenMP standard. Int. J. Parallel Prog. 45(3), 612–633 (2017)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780622.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Royuela, S., Serrano, M.A., Garcia-Gasulla, M., Mateo Bellido, S., Labarta, J., Quiñones, E. (2019). The Cooperative Parallel: A Discussion About Run-Time Schedulers for Nested Parallelism. In: Fan, X., de Supinski, B., Sinnen, O., Giacaman, N. (eds) OpenMP: Conquering the Full Hardware Spectrum. IWOMP 2019. Lecture Notes in Computer Science(), vol 11718. Springer, Cham. https://doi.org/10.1007/978-3-030-28596-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-28596-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28595-1
Online ISBN: 978-3-030-28596-8
eBook Packages: Computer ScienceComputer Science (R0)