Description, Implementation and Evaluation of an Affinity Clause for Task Directives

Virouleau, Philippe; Roussel, Adrien; Broquedis, François; Gautier, Thierry; Rastello, Fabrice; Gratien, Jean-Marc

doi:10.1007/978-3-319-45550-1_5

Philippe Virouleau^16,17,
Adrien Roussel^16,17,18,
François Broquedis¹⁶,
Thierry Gautier^16,17,
Fabrice Rastello¹⁶ &
…
Jean-Marc Gratien¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1109 Accesses
4 Citations

Abstract

OpenMP 4.0 introduced dependent tasks, which give the programmer a way to express fine grain parallelism. Using appropriate OS support (such as NUMA libraries), the runtime can rely on the information in the depend clause to dynamically map the tasks to the architecture topology. Controlling data locality is one of the key factors to reach a high level of performance when targeting NUMA architectures. On this topic, OpenMP does not provide a lot of flexibility to the programmer yet, which lets the runtime decide where a task should be executed. In this paper, we present a class of applications which would benefit from having such a control and flexibility over tasks and data placement. We also propose our own interpretation of the new affinity clause for the task directive, which is being discussed by the OpenMP Architecture Review Board. This clause enables the programmer to give hints to the runtime about tasks placement during the program execution, which can be used to control the data mapping on the architecture. In our proposal, the programmer can express affinity between a task and the following resources: a thread, a NUMA node, and a data. We then present an implementation of this proposal in the Clang-3.8 compiler, and an implementation of the corresponding extensions in our OpenMP runtime libKOMP. Finally, we present a preliminary evaluation of this work running two task-based OpenMP kernels on a 192-core NUMA architecture, that shows noticeable improvements both in terms of performance and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bleuse, R., Gautier, T., Lima, J.V.F., Mounié, G., Trystram, D.: Scheduling data flow program in XKaapi: a new affinity based algorithm for heterogeneous architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 560–571. Springer, Heidelberg (2014)
Google Scholar
Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Programm. 38(5), 418–439 (2010)
Article MATH Google Scholar
Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an efficient OpenMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Chapter Google Scholar
Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). Special Issue on OpenMP; Müller, M.S., Ayguade, E. (eds.)
Article Google Scholar
Durand, M., Broquedis, F., Gautier, T., Raffin, B.: OpenMP in the Era of Low Power Devices and Accelerators, pp. 141–155. Springer, Berlin, Heidelberg (2013)
Book Google Scholar
Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in openmp. Sci. Program. 18(3–4), 169–181 (2010)
Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet MATH Google Scholar
Kennedy, K., Koelbel, C., Zima, H.: The rise and fall of high performance fortran: an historical object lesson. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, pp. 7-1–7-22. ACM, New York (2007)
Google Scholar
Lima, J.V.F., Gautier, T., Danjean, V., Raffin, B., Maillard, N.: Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures. Parallel Comput. 44, 37–52 (2015)
Article MathSciNet Google Scholar
Marowka, A., Liu, Z., Chapman, B.: Openmp-oriented applications for distributed shared memory architectures: research articles. Concurr. Comput. Pract. Exper. 16, 371–384 (2004)
Article Google Scholar
Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore NUMA systems. IJHPCA 26(2), 110–124 (2012)
Google Scholar
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012 (2012)
Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2nd edn. SIAM, Philadelphia (2003)
Book MATH Google Scholar
Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgments

This work is integrated and supported by the ELCI project, a French FSN (“Fond pour la Société Numérique”) project that associates academic and industrial partners to design and provide a software environment for very high performance computing.

Author information

Authors and Affiliations

Inria, Univ. Grenoble Alpes, CNRS, Grenoble Institute of Technology, LIG, Grenoble, France
Philippe Virouleau, Adrien Roussel, François Broquedis, Thierry Gautier & Fabrice Rastello
LIP, ENS de Lyon, Lyon, France
Philippe Virouleau, Adrien Roussel & Thierry Gautier
IFPEN, Rueil Malmaison, France
Adrien Roussel & Jean-Marc Gratien

Authors

Philippe Virouleau
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Roussel
View author publications
You can also search for this author in PubMed Google Scholar
François Broquedis
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Rastello
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Gratien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Virouleau .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Virouleau, P., Roussel, A., Broquedis, F., Gautier, T., Rastello, F., Gratien, JM. (2016). Description, Implementation and Evaluation of an Affinity Clause for Task Directives. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_5
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics