The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs

Martineau, Matt; McIntosh-Smith, Simon

doi:10.1007/978-3-319-65578-9_13

Matt Martineau¹⁸ &
Simon McIntosh-Smith¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10468))

Included in the following conference series:

International Workshop on OpenMP

1173 Accesses
8 Citations

Abstract

This research considers the productivity, portability, and performance offered by the OpenMP parallel programming model, from the perspective of scientific applications. We discuss important considerations for scientific application developers tackling large software projects with OpenMP, including straightforward code mechanisms to improve productivity and portability. Performance results are presented across multiple modern HPC devices, including Intel Xeon, and Xeon Phi CPUs, POWER8 CPUs, and NVIDIA GPUs. The results are collected for three exemplar applications: hydrodynamics, heat conduction and neutral particle transport, using modern compilers with OpenMP support. The results show that while current OpenMP implementations are able to achieve good performance on the breadth of modern hardware for memory bandwidth bound applications, our memory latency bound application performs less consistently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Heroux, M., Doerfler. D., et al.: Improving performance via mini-applications, Sandia National Laboratories, Technical report SAND2009-5574 (2009)
Google Scholar
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating OpenMP 4.0’s effectiveness as a heterogeneous parallel programming model. In: Proceedings of 21st International Workship on High-Level Parallel Programming Models and Supportive Environments, HIPS 2016 (2016)
Google Scholar
Eichenberger, A.E., et al.: OMPT: An OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40698-0_13
Chapter Google Scholar
Antao, S.F., Bataev, A., Jacob, A.C., Bercea, G.T., Eichenberger, A.E., Rokos, G., Martineau, M., Jin, T., Ozen, G., Sura, Z., Chen, T., Sung, H., Bertolli, C., O’Brien, K.: Offloading support for OpenMP in Clang and LLVM. In: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, LLVM-HPC 2016, Piscataway, NJ, USA, pp. 1–11. IEEE Press (2016). https://doi.org/10.1109/LLVM-HPC.2016.6
Mellor-Crummey, J., Missing pieces in the OpenMP ecosystem. In: Keynote at International Workshop on OpenMP (2015)
Google Scholar
Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016 (2016)
Google Scholar
Deakin, T., Price, J., et al.: BabelStream (UoB HPC Group) (2017). https://github.com/UoB-HPC/BabelStream
Lewis, E., Miller, W.: Computational Methods of Neutron Transport. Wiley, New York (1984)
MATH Google Scholar
Gentile, N.: Monte Carlo Particle Transport: Algorithm and Performance Overview. Lawrence Livermore, Livermore (2005)
Google Scholar
Salmon, J.K., Moraes, M.A., Dror, R.O., Shaw, D.E.: Parallel randomnumbers: as easy as 1, 2, 3. In: 2011 International Conference for High Performance Computing, Networking, Storageand Analysis (SC), pp. 1–12. IEEE (2011)
Google Scholar
Draeger, E.W., Karlin, I., Scogland, T., Richards, D., Glosli, J., Jones, H., Poliakoff, D., Kunen, A.: OpenMP 4.5 IBM November 2015 Hackathon: current status and lessons learned, Technical report LLNL-TR-680824, Lawrence Livermore National Laboratory, Technical report (2016)
Google Scholar
Karlin, I., et al.: Early experiences porting three applications to OpenMP 4.5. In: Maruyama, N., Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 281–292. Springer, Cham (2016). doi:10.1007/978-3-319-45550-1_20
Chapter Google Scholar
Bercea, G., Bertolli, C., Antao, S., Jacob, A., et al.: Performance analysis of OpenMP on a GPU using a coral proxy application. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)
Google Scholar
Lin, P.-H., Liao, C., Quinlan, D.J., Guzik, S.: Experiences of using the OpenMP accelerator model to Port DOE stencil applications. In: Terboven, C., Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 45–59. Springer, Cham (2015). doi:10.1007/978-3-319-24595-9_4
Chapter Google Scholar
Bertolli, C., Antao, S., Bercea, G.-T., et al.: Integrating GPU support for OpenMP offloading Directives into Clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015 (2015)
Google Scholar
Hart, A.: First experiences porting a parallel application to a hybrid supercomputer with OpenMP4.0 device constructs. In: Terboven, C., Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 73–85. Springer, Cham (2015). doi:10.1007/978-3-319-24595-9_6
Chapter Google Scholar
Wienke, S., Terboven, C., Beyer, J.C., Müller, M.S.: A pattern-based comparison of OpenACC and OpenMP for accelerator computing. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 812–823. Springer, Cham (2014). doi:10.1007/978-3-319-09873-9_68
Google Scholar
Lopez, M.G., Larrea, V.V., Joubert, W., Hernandez, O., Haidar, A., Tomov, S., Dongarra, J.: Towards achieving performance portability using directives for accelerators. In: Proceedings of the Third International Workshop on Accelerator Programming Using Directives, WACCPD, 162016 (2016)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the EPSRC for funding this research. We would also like to thank the Intel Parallel Computing Center (IPCC) at the University of Bristol for access to Intel hardware, and the EPSRC GW4 Tier 2 Isambard service for access to phase 1 of the Isambard supercomputer.

Author information

Authors and Affiliations

HPC Group, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS81UB, UK
Matt Martineau & Simon McIntosh-Smith

Authors

Matt Martineau
View author publications
You can also search for this author in PubMed Google Scholar
Simon McIntosh-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matt Martineau .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
Sandia National Laboratories, Albuquerque, New Mexico, USA
Stephen L. Olivier
RWTH Aachen University, Aachen, Germany
Christian Terboven
Stony Brook University, Stony Brook, New York, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martineau, M., McIntosh-Smith, S. (2017). The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-65578-9_13
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs