An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines

Durand, Marie; Broquedis, François; Gautier, Thierry; Raffin, Bruno

doi:10.1007/978-3-642-40698-0_11

Marie Durand¹⁹,
François Broquedis²⁰,
Thierry Gautier¹⁹ &
…
Bruno Raffin¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8122))

Included in the following conference series:

International Workshop on OpenMP

1430 Accesses
18 Citations

Abstract

Nowadays shared memory HPC platforms expose a large number of cores organized in a hierarchical way. Parallel application programmers struggle to express more and more fine-grain parallelism and to ensure locality on such NUMA platforms. Independent loops stand as a natural source of parallelism. Parallel environments like OpenMP provide ways of parallelizing them efficiently, but the achieved performance is closely related to the choice of parameters like the granularity of work or the loop scheduler. Considering that both can depend on the target computer, the input data and the loop workload, the application programmer most of the time fails at designing both portable and efficient implementations. We propose in this paper a new OpenMP loop scheduler, called adaptive, that dynamically adapts the granularity of work considering the underlying system state. Our scheduler is able to perform dynamic load balancing while taking memory affinity into account on NUMA architectures. Results show that adaptive outperforms state-of-the-art OpenMP loop schedulers on memory-bound irregular applications, while obtaining performance comparable to static on parallel loops with a regular workload.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule clause really necessary in openMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–159. Springer, Heidelberg (2003)
Chapter MATH Google Scholar
Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.-A., Namyst, R.: Structuring the execution of OpenMP applications for multicore architectures. In: Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA. IEEE Computer Society Press (April 2010)
Google Scholar
Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Chapter Google Scholar
Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)
Chapter MATH Google Scholar
Bull, J.M.: Measuring synchronisation and scheduling overheads in openmp. In: Proceedings of First European Workshop on OpenMP, pp. 99–105 (1999)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE Computer Society, Washington, DC (2009)
Google Scholar
Durand, M., Raffin, B., Faure, F.: A Packed Memory Array to Keep Moving Particles Sorted. In: 9th Workshop on Virtual Reality Interaction and Physical Simulation (2012)
Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998)
Article Google Scholar
Hoetzlein, R.C.: Fluids v2.0, open source, fluid simulator (2008)
Google Scholar
Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in openmp. Sci. Program. 18(3-4), 169–181 (2010)
Google Scholar
Ihmsen, M., Akinci, N., Becker, M., Teschner, M.: A parallel sph implementation on multi-core cpus. Computer Graphics Forum 30(1), 99–112 (2011)
Article Google Scholar
Mahéo, A., Koliaï, S., Carribault, P., Pérache, M., Jalby, W.: Adaptive openmp for large numa nodes. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 254–257. Springer, Heidelberg (2012)
Chapter Google Scholar
Marowka, A., Liu, Z., Chapman, B.: Openmp-oriented applications for distributed shared memory architectures: Research articles. Concurr. Comput.: Pract. Exper. (2004)
Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (December 1995)
Google Scholar
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos
Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore numa systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)
Article Google Scholar
OpenMP Architecture Review Board (1997-2008), http://www.openmp.org
Subramaniam, S., Eager, D.L.: Affinity scheduling of unbalanced workloads. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Supercomputing 1994, pp. 214–226. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar
Tchiboukdjian, M., Danjean, V., Gautier, T., Le Mentec, F., Raffin, B.: A work stealing scheduler for parallel loops on shared cache multicores. In: Proceedings of the 2010 Conference on Parallel Processing, Euro-Par 2010, pp. 99–107. Springer (2011)
Google Scholar
Traoré, D., Roch, J.-L., Maillard, N., Gautier, T., Bernard, J.: Deque-free work-optimal parallel stl algorithms. In: Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par 2008, Berlin, Heidelberg, pp. 887–897 (2008)
Google Scholar
Yan, Y., Jin, C., Zhang, X.: Adaptively scheduling parallel loops in distributed shared-memory systems. IEEE Trans. on Parallel and Distributed Systems 1 (January 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, France
Marie Durand, Thierry Gautier & Bruno Raffin
MOAIS Team, Computer Science Laboratory of Grenoble, Grenoble Institute of Technology, France
François Broquedis

Authors

Marie Durand
View author publications
You can also search for this author in PubMed Google Scholar
François Broquedis
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Raffin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Computer Science, The Australian National University, Australia
Alistair P. Rendell
Dept. of Computer Science, University of Houston Oak Ridge National Laboratory, USA
Barbara M. Chapman
Lehrstuhl für Hochleistungsrechnen und Rechen- und Kommunikationszentrum, RWTH Aachen University, Seffenter Weg 23, 52074, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Durand, M., Broquedis, F., Gautier, T., Raffin, B. (2013). An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds) OpenMP in the Era of Low Power Devices and Accelerators. IWOMP 2013. Lecture Notes in Computer Science, vol 8122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40698-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-40698-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40697-3
Online ISBN: 978-3-642-40698-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics