Pragmatic Performance Portability with OpenMP 4.x

Martineau, Matt; Price, James; McIntosh-Smith, Simon; Gaudin, Wayne

doi:10.1007/978-3-319-45550-1_18

Matt Martineau¹⁶,
James Price¹⁶,
Simon McIntosh-Smith¹⁶ &
…
Wayne Gaudin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1206 Accesses
9 Citations

Abstract

In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM’s OpenMP 4.5 Clang branch (clang-ykt) targeting an NVIDIA K20x, the Intel compiler 16 targeting an Intel Xeon Phi Knights Landing, and GCC 6.1 targeting an AMD APU. We outline the mechanisms that they use to map the OpenMP model onto their target architectures, and conduct performance testing with a number of representative data parallel kernels. Following this we present a discussion about the current state of play in terms of performance portability and propose some straightforward guidelines for writing performance portable code, derived from our observations. At the time of writing, developers will likely have to rely on the pre-processor for certain kernels to achieve functional portability, but we expect that future homogenisation of required directives between compilers and architectures is feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/UoB-HPC/pragmatic_kernels.

References

Bercea, G., Bertolli, C., Antao, S., Jacob, A., et al.: Performance analysis of OpenMPon a GPU using a Coral Proxy application. In: Proceedings of the 6th InternationalWorkshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)
Google Scholar
Bertolli, C., Antao, S., Bercea, G.-T., et al.: Integrating GPU support for OpenMP offloading directives into clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015 (2015)
Google Scholar
Bertolli, C., Antao, S.F., Eichenberger, A., et al.: Coordinating GPU threads for OpenMP 4.0 in LLVM. In: Proceedings of the LLVM Compiler Infrastructure in HPC, pp. 12–21. IEEE Press (2014)
Google Scholar
Hart, A.: First experiences porting a parallel application to a hybrid supercomputer with OpenMP 4.0 device constructs. In: Proceedings of the OpenMP: Heterogenous Execution and Data Movements: 11th International Workshop on OpenMP, IWOMP, pp. 73–85 (2015)
Google Scholar
Kogge, P., Shalf, J.: Exascale computing trends: adjusting to the “New Normal” for computer architecture. Comput. Sci. Eng. 15(6), 16–26 (2013)
Article Google Scholar
Larkin, J.: Performance portability through descriptive parallelism. Presentation at DOE Centers of Execellence Performance Portability Meeting (2016). https://asc.llnl.gov/DOE-COE-Mtg-2016/talks/2-20_Larkin.pdf
Lin, P., Liao, C., Quinlan, D., et al.: Experiences of using the OpenMP accelerator model to port DOE stencil applications. In: Proceedings of the OpenMP: Heterogenous Execution and Data Movements: 11th International Workshop on OpenMP, IWOMP 2015, pp. 45–59 (2015)
Google Scholar
Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016 (2016)
Google Scholar
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating OpenMP 4.0’s effectiveness as a heterogeneous parallel programming model. In: Proceedings of 21st International Workship on High-Level Parallel Programming Models and Supportive Environments, HIPS 2016 (2016)
Google Scholar
McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Heidelberg (2014)
Google Scholar
OpenMP Architecture Review Board. OpenMP Application Program Interface v4.5 (2015)
Google Scholar

Download references

Acknowledgements

We would like to thank Cray Inc. for providing access to their XC40 supercomputer Swan, which hosted the Intel Xeon Broadwell, and NVIDIA K20x processors. The Intel Xeon Phi KNL was provided by the Intel Parallel Computing Center at the University of Bristol, and we would like to thank Jim Cownie at Intel for his support. We also want to thank the sponsors of this research, EPSRC and the UK Atomic Weapons Establishment.

Author information

Authors and Affiliations

Merchant Venturers Building, University of Bristol, Bristol, UK
Matt Martineau, James Price & Simon McIntosh-Smith
UK Atomic Weapons Establishment, Aldermaston, UK
Wayne Gaudin

Authors

Matt Martineau
View author publications
You can also search for this author in PubMed Google Scholar
James Price
View author publications
You can also search for this author in PubMed Google Scholar
Simon McIntosh-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Gaudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matt Martineau .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martineau, M., Price, J., McIntosh-Smith, S., Gaudin, W. (2016). Pragmatic Performance Portability with OpenMP 4.x. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_18
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics