Skip to main content

Pragmatic Performance Portability with OpenMP 4.x

  • Conference paper
  • First Online:
Book cover OpenMP: Memory, Devices, and Tasks (IWOMP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

Abstract

In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM’s OpenMP 4.5 Clang branch (clang-ykt) targeting an NVIDIA K20x, the Intel compiler 16 targeting an Intel Xeon Phi Knights Landing, and GCC 6.1 targeting an AMD APU. We outline the mechanisms that they use to map the OpenMP model onto their target architectures, and conduct performance testing with a number of representative data parallel kernels. Following this we present a discussion about the current state of play in terms of performance portability and propose some straightforward guidelines for writing performance portable code, derived from our observations. At the time of writing, developers will likely have to rely on the pre-processor for certain kernels to achieve functional portability, but we expect that future homogenisation of required directives between compilers and architectures is feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/UoB-HPC/pragmatic_kernels.

References

  1. Bercea, G., Bertolli, C., Antao, S., Jacob, A., et al.: Performance analysis of OpenMPon a GPU using a Coral Proxy application. In: Proceedings of the 6th InternationalWorkshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)

    Google Scholar 

  2. Bertolli, C., Antao, S., Bercea, G.-T., et al.: Integrating GPU support for OpenMP offloading directives into clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015 (2015)

    Google Scholar 

  3. Bertolli, C., Antao, S.F., Eichenberger, A., et al.: Coordinating GPU threads for OpenMP 4.0 in LLVM. In: Proceedings of the LLVM Compiler Infrastructure in HPC, pp. 12–21. IEEE Press (2014)

    Google Scholar 

  4. Hart, A.: First experiences porting a parallel application to a hybrid supercomputer with OpenMP 4.0 device constructs. In: Proceedings of the OpenMP: Heterogenous Execution and Data Movements: 11th International Workshop on OpenMP, IWOMP, pp. 73–85 (2015)

    Google Scholar 

  5. Kogge, P., Shalf, J.: Exascale computing trends: adjusting to the “New Normal” for computer architecture. Comput. Sci. Eng. 15(6), 16–26 (2013)

    Article  Google Scholar 

  6. Larkin, J.: Performance portability through descriptive parallelism. Presentation at DOE Centers of Execellence Performance Portability Meeting (2016). https://asc.llnl.gov/DOE-COE-Mtg-2016/talks/2-20_Larkin.pdf

  7. Lin, P., Liao, C., Quinlan, D., et al.: Experiences of using the OpenMP accelerator model to port DOE stencil applications. In: Proceedings of the OpenMP: Heterogenous Execution and Data Movements: 11th International Workshop on OpenMP, IWOMP 2015, pp. 45–59 (2015)

    Google Scholar 

  8. Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016 (2016)

    Google Scholar 

  9. Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating OpenMP 4.0’s effectiveness as a heterogeneous parallel programming model. In: Proceedings of 21st International Workship on High-Level Parallel Programming Models and Supportive Environments, HIPS 2016 (2016)

    Google Scholar 

  10. McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Heidelberg (2014)

    Google Scholar 

  11. OpenMP Architecture Review Board. OpenMP Application Program Interface v4.5 (2015)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Cray Inc. for providing access to their XC40 supercomputer Swan, which hosted the Intel Xeon Broadwell, and NVIDIA K20x processors. The Intel Xeon Phi KNL was provided by the Intel Parallel Computing Center at the University of Bristol, and we would like to thank Jim Cownie at Intel for his support. We also want to thank the sponsors of this research, EPSRC and the UK Atomic Weapons Establishment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matt Martineau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Martineau, M., Price, J., McIntosh-Smith, S., Gaudin, W. (2016). Pragmatic Performance Portability with OpenMP 4.x. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45550-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics