OpenMP Extension for Explicit Task Allocation on NUMA Architecture

Lee, Jinpil; Tsugane, Keisuke; Murai, Hitoshi; Sato, Mitsuhisa

doi:10.1007/978-3-319-45550-1_7

Jinpil Lee¹⁶,
Keisuke Tsugane¹⁷,
Hitoshi Murai¹⁶ &
…
Mitsuhisa Sato¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1152 Accesses

Abstract

Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a promising programming model for many-core architecture because it can exploit parallelism in irregular applications with fine-grain synchronization. However, the current specification lacks functionality to improve data locality in task parallelism. In this paper, we propose an extension for the OpenMP task construct to specify the location of tasks to exploit the locality in an explicit manner. The prototype compiler is implemented based on GCC. The performance evaluation using the KASTORS benchmark shows that our approach can reduce remote page access. The Jacobi kernel using our approach shows 3.6 times better performance than GCC when using 36 threads on a 36-core, 4-NUMA node machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barcelona OpenMP Task Suite (BOTS). https://pm.bsc.es/projects/bots/
Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). http://doi.acm.org/10.1145/2641764
Article Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE Computer Society, Washington, DC (2009). doi:10.1109/ICPP.2009.64
KASTORS Benchmark. https://gforge.inria.fr/projects/kastors/
Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-aware task scheduling and data distribution on NUMA systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40698-0_12
Chapter Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). doi:10.1177/1094342011434065
Article Google Scholar
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings ofthe International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos (2012). http://dl.acm.org/citation.cfm?id=2388996.2389085
Tahan, O.: Towards efficient OpenMP strategies for non-uniform architectures. CoRR abs/1411.7131 (2014). http://arxiv.org/abs/1411.7131
Vikranth, B., Wankar, R., Rao, C.R.: Topology aware task stealing for on-chip NUMA multi-core processors. Procedia Comput. Sci. 18, 379–388 (2013). 2013 International Conference on Computational Science. http://www.sciencedirect.com/science/article/pii/S187705091300344X
Article Google Scholar
Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11454-5_2
Google Scholar

Download references

Author information

Authors and Affiliations

RIKEN Advanced Institute for Computational Science, Kobe, Japan
Jinpil Lee, Hitoshi Murai & Mitsuhisa Sato
University of Tsukuba, Tsukuba, Japan
Keisuke Tsugane

Authors

Jinpil Lee
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Tsugane
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Murai
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuhisa Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinpil Lee .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, J., Tsugane, K., Murai, H., Sato, M. (2016). OpenMP Extension for Explicit Task Allocation on NUMA Architecture. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_7
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics