OpenMP Extension for Explicit Task Allocation on NUMA Architecture

  • Jinpil LeeEmail author
  • Keisuke Tsugane
  • Hitoshi Murai
  • Mitsuhisa Sato
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a promising programming model for many-core architecture because it can exploit parallelism in irregular applications with fine-grain synchronization. However, the current specification lacks functionality to improve data locality in task parallelism. In this paper, we propose an extension for the OpenMP task construct to specify the location of tasks to exploit the locality in an explicit manner. The prototype compiler is implemented based on GCC. The performance evaluation using the KASTORS benchmark shows that our approach can reduce remote page access. The Jacobi kernel using our approach shows 3.6 times better performance than GCC when using 36 threads on a 36-core, 4-NUMA node machine.


OpenMP Task parallelism NUMA optimization 


  1. 1.
    Barcelona OpenMP Task Suite (BOTS).
  2. 2.
    Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). CrossRefGoogle Scholar
  3. 3.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE Computer Society, Washington, DC (2009). doi: 10.1109/ICPP.2009.64
  4. 4.
  5. 5.
    Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-aware task scheduling and data distribution on NUMA systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40698-0_12 CrossRefGoogle Scholar
  6. 6.
    Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). doi: 10.1177/1094342011434065 CrossRefGoogle Scholar
  7. 7.
    Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings ofthe International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos (2012).
  8. 8.
    Tahan, O.: Towards efficient OpenMP strategies for non-uniform architectures. CoRR abs/1411.7131 (2014).
  9. 9.
    Vikranth, B., Wankar, R., Rao, C.R.: Topology aware task stealing for on-chip NUMA multi-core processors. Procedia Comput. Sci. 18, 379–388 (2013). 2013 International Conference on Computational Science. CrossRefGoogle Scholar
  10. 10.
    Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11454-5_2 Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jinpil Lee
    • 1
    Email author
  • Keisuke Tsugane
    • 2
  • Hitoshi Murai
    • 1
  • Mitsuhisa Sato
    • 1
  1. 1.RIKEN Advanced Institute for Computational ScienceKobeJapan
  2. 2.University of TsukubaTsukubaJapan

Personalised recommendations