Description, Implementation and Evaluation of an Affinity Clause for Task Directives

  • Philippe VirouleauEmail author
  • Adrien Roussel
  • François Broquedis
  • Thierry Gautier
  • Fabrice Rastello
  • Jean-Marc Gratien
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


OpenMP 4.0 introduced dependent tasks, which give the programmer a way to express fine grain parallelism. Using appropriate OS support (such as NUMA libraries), the runtime can rely on the information in the depend clause to dynamically map the tasks to the architecture topology. Controlling data locality is one of the key factors to reach a high level of performance when targeting NUMA architectures. On this topic, OpenMP does not provide a lot of flexibility to the programmer yet, which lets the runtime decide where a task should be executed. In this paper, we present a class of applications which would benefit from having such a control and flexibility over tasks and data placement. We also propose our own interpretation of the new affinity clause for the task directive, which is being discussed by the OpenMP Architecture Review Board. This clause enables the programmer to give hints to the runtime about tasks placement during the program execution, which can be used to control the data mapping on the architecture. In our proposal, the programmer can express affinity between a task and the following resources: a thread, a NUMA node, and a data. We then present an implementation of this proposal in the Clang-3.8 compiler, and an implementation of the corresponding extensions in our OpenMP runtime libKOMP. Finally, we present a preliminary evaluation of this work running two task-based OpenMP kernels on a 192-core NUMA architecture, that shows noticeable improvements both in terms of performance and scalability.


OpenMP Task dependencies Affinity Runtime systems NUMA 



This work is integrated and supported by the ELCI project, a French FSN (“Fond pour la Société Numérique”) project that associates academic and industrial partners to design and provide a software environment for very high performance computing.


  1. 1.
    Bleuse, R., Gautier, T., Lima, J.V.F., Mounié, G., Trystram, D.: Scheduling data flow program in XKaapi: a new affinity based algorithm for heterogeneous architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 560–571. Springer, Heidelberg (2014)Google Scholar
  2. 2.
    Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Programm. 38(5), 418–439 (2010)CrossRefzbMATHGoogle Scholar
  3. 3.
    Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an efficient OpenMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). Special Issue on OpenMP; Müller, M.S., Ayguade, E. (eds.)CrossRefGoogle Scholar
  5. 5.
    Durand, M., Broquedis, F., Gautier, T., Raffin, B.: OpenMP in the Era of Low Power Devices and Accelerators, pp. 141–155. Springer, Berlin, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in openmp. Sci. Program. 18(3–4), 169–181 (2010)Google Scholar
  7. 7.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Kennedy, K., Koelbel, C., Zima, H.: The rise and fall of high performance fortran: an historical object lesson. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, pp. 7-1–7-22. ACM, New York (2007)Google Scholar
  9. 9.
    Lima, J.V.F., Gautier, T., Danjean, V., Raffin, B., Maillard, N.: Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures. Parallel Comput. 44, 37–52 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Marowka, A., Liu, Z., Chapman, B.: Openmp-oriented applications for distributed shared memory architectures: research articles. Concurr. Comput. Pract. Exper. 16, 371–384 (2004)CrossRefGoogle Scholar
  11. 11.
    Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore NUMA systems. IJHPCA 26(2), 110–124 (2012)Google Scholar
  12. 12.
    Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012 (2012)Google Scholar
  13. 13.
    Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2nd edn. SIAM, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  14. 14.
    Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Philippe Virouleau
    • 1
    • 2
    Email author
  • Adrien Roussel
    • 1
    • 2
    • 3
  • François Broquedis
    • 1
  • Thierry Gautier
    • 1
    • 2
  • Fabrice Rastello
    • 1
  • Jean-Marc Gratien
    • 3
  1. 1.Inria, Univ. Grenoble Alpes, CNRS, Grenoble Institute of Technology, LIGGrenobleFrance
  2. 2.LIP, ENS de LyonLyonFrance
  3. 3.IFPENRueil MalmaisonFrance

Personalised recommendations