Skip to main content

Early Experiences Porting Three Applications to OpenMP 4.5

  • Conference paper
  • First Online:
OpenMP: Memory, Devices, and Tasks (IWOMP 2016)

Abstract

Many application developers need code that runs efficiently on multiple architectures, but cannot afford to maintain architecturally specific codes. With the addition of target directives to support offload accelerators, OpenMP now has the machinery to support performance portable code development. In this paper, we describe application ports of Kripke, Cardioid, and LULESH to OpenMP 4.5 and discuss our successes and failures. Challenges encountered include how OpenMP interacts with C++ including classes with virtual methods and lambda functions. Also, the lack of deep copy support in OpenMP increased code complexity. Finally, GPUs inability to handle virtual function calls required code restructuring. Despite these challenges we demonstrate OpenMP obtains performance within 10 % of hand written CUDA for memory bandwidth bound kernels in LULESH. In addition, we show with a minor change to the OpenMP standard that register usage for OpenMP code can be reduced by up to 10 %.

The rights of this work are transferred to the extent transferable according to title 17 U.S.C. 105.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://codesign.llnl.gov/lulesh.php.

  2. 2.

    firstprivate: Specifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct.

References

  1. Openmp application programming interface, November 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf

  2. Beckingsale, D.: Lightweight models for dynamically tuning data-dependent code, April 2016

    Google Scholar 

  3. Bercea, G.T., Bertolli, C., Antao, S.F., Jacob, A.C., Eichenberger, A.E., Chen, T., Sura, Z., Sung, H., Rokos, G., Appelhans, D., et al.: Performance analysis of openmp on a gpu using a coral proxy application. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)

    Google Scholar 

  4. Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Draeger, E.W., Karlin, I., Scogland, T., Richards, D., Glosli, J., Jones, H., Poliakoff, D., Kunen, A.: Openmp 4.5 ibm november 2015 hackathon: current status and lessons learned. Technical report LLNL-TR-680824, Lawrence Livermore National Laboratory, January 2016

    Google Scholar 

  6. Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)

    Article  Google Scholar 

  7. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. Innov. Parallel Comput. 2012, 1–10 (2012)

    Google Scholar 

  8. Hornung, R., Keasler, J.: The raja portability layer: overview and status. Technical report LLNL-TR-661403, Lawrence Livermore National Laboratory, September 2014

    Google Scholar 

  9. Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical report LLNL-TR-608824, December 2012

    Google Scholar 

  10. Kunen, A.J.: Tloops - raja-like transformations in kripke, February 2015

    Google Scholar 

  11. Kunen, A., Bailey, T., Brown, P.: Kripke-a massively parallel transport mini-app. Technical report LLNL-CONF-675389, Lawrence Livermore National Laboratory, April 2015

    Google Scholar 

  12. Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. IEEE Computer Society Press, November 2012

    Google Scholar 

  13. Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 1–10. ACM (2016)

    Google Scholar 

  14. Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating openmp 4.0’s effectiveness as a heterogeneous parallel programming model. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW). IEEE, May 2016

    Google Scholar 

  15. Muralidharan, S., Garland, M., Catanzaro, B., Sidelnik, A., Hall, M.: A collection-oriented programming model for performance portability. ACM SIGPLAN Not. 50, 263–264 (2015). ACM

    Article  Google Scholar 

  16. Pickering, B.P., Jackson, C.W., Scogland, T.R., Feng, W.C., Roy, C.J.: Directive-based GPU programming for computational fluid dynamics. Comput. Fluids 114, 242–253 (2015). http://www.sciencedirect.com/science/article/pii/S004579301500081X

    Article  MathSciNet  Google Scholar 

  17. Richards, D.F., Glosli, J.N., Draeger, E.W., Mirin, A.A., Chan, B., Fattebert, J., Krauss, W.D., Oppelstrup, T., Butler, C.J., Gunnels, J.A., et al.: Towards real-time simulation of cardiac electrophysiology in a human heart at high resolution. Comput. Meth. Biomech. Biomed. Eng. 16(7), 802–805 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Karlin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Karlin, I. et al. (2016). Early Experiences Porting Three Applications to OpenMP 4.5. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45550-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics