Advertisement

Early Experiences Porting Three Applications to OpenMP 4.5

  • Ian KarlinEmail author
  • Tom Scogland
  • Arpith C. Jacob
  • Samuel F. Antao
  • Gheorghe-Teodor Bercea
  • Carlo Bertolli
  • Bronis R. de Supinski
  • Erik W. Draeger
  • Alexandre E. Eichenberger
  • Jim Glosli
  • Holger Jones
  • Adam Kunen
  • David Poliakoff
  • David F. Richards
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)

Abstract

Many application developers need code that runs efficiently on multiple architectures, but cannot afford to maintain architecturally specific codes. With the addition of target directives to support offload accelerators, OpenMP now has the machinery to support performance portable code development. In this paper, we describe application ports of Kripke, Cardioid, and LULESH to OpenMP 4.5 and discuss our successes and failures. Challenges encountered include how OpenMP interacts with C++ including classes with virtual methods and lambda functions. Also, the lack of deep copy support in OpenMP increased code complexity. Finally, GPUs inability to handle virtual function calls required code restructuring. Despite these challenges we demonstrate OpenMP obtains performance within 10 % of hand written CUDA for memory bandwidth bound kernels in LULESH. In addition, we show with a minor change to the OpenMP standard that register usage for OpenMP code can be reduced by up to 10 %.

Keywords

OpenMP 4.5 Application porting experiences Performance portability 

References

  1. 1.
    Openmp application programming interface, November 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf
  2. 2.
    Beckingsale, D.: Lightweight models for dynamically tuning data-dependent code, April 2016Google Scholar
  3. 3.
    Bercea, G.T., Bertolli, C., Antao, S.F., Jacob, A.C., Eichenberger, A.E., Chen, T., Sura, Z., Sung, H., Rokos, G., Appelhans, D., et al.: Performance analysis of openmp on a gpu using a coral proxy application. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)Google Scholar
  4. 4.
    Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Draeger, E.W., Karlin, I., Scogland, T., Richards, D., Glosli, J., Jones, H., Poliakoff, D., Kunen, A.: Openmp 4.5 ibm november 2015 hackathon: current status and lessons learned. Technical report LLNL-TR-680824, Lawrence Livermore National Laboratory, January 2016Google Scholar
  6. 6.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)CrossRefGoogle Scholar
  7. 7.
    Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. Innov. Parallel Comput. 2012, 1–10 (2012)Google Scholar
  8. 8.
    Hornung, R., Keasler, J.: The raja portability layer: overview and status. Technical report LLNL-TR-661403, Lawrence Livermore National Laboratory, September 2014Google Scholar
  9. 9.
    Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical report LLNL-TR-608824, December 2012Google Scholar
  10. 10.
    Kunen, A.J.: Tloops - raja-like transformations in kripke, February 2015Google Scholar
  11. 11.
    Kunen, A., Bailey, T., Brown, P.: Kripke-a massively parallel transport mini-app. Technical report LLNL-CONF-675389, Lawrence Livermore National Laboratory, April 2015Google Scholar
  12. 12.
    Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. IEEE Computer Society Press, November 2012Google Scholar
  13. 13.
    Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 1–10. ACM (2016)Google Scholar
  14. 14.
    Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating openmp 4.0’s effectiveness as a heterogeneous parallel programming model. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW). IEEE, May 2016Google Scholar
  15. 15.
    Muralidharan, S., Garland, M., Catanzaro, B., Sidelnik, A., Hall, M.: A collection-oriented programming model for performance portability. ACM SIGPLAN Not. 50, 263–264 (2015). ACMCrossRefGoogle Scholar
  16. 16.
    Pickering, B.P., Jackson, C.W., Scogland, T.R., Feng, W.C., Roy, C.J.: Directive-based GPU programming for computational fluid dynamics. Comput. Fluids 114, 242–253 (2015). http://www.sciencedirect.com/science/article/pii/S004579301500081X MathSciNetCrossRefGoogle Scholar
  17. 17.
    Richards, D.F., Glosli, J.N., Draeger, E.W., Mirin, A.A., Chan, B., Fattebert, J., Krauss, W.D., Oppelstrup, T., Butler, C.J., Gunnels, J.A., et al.: Towards real-time simulation of cardiac electrophysiology in a human heart at high resolution. Comput. Meth. Biomech. Biomed. Eng. 16(7), 802–805 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ian Karlin
    • 1
    Email author
  • Tom Scogland
    • 1
  • Arpith C. Jacob
    • 2
  • Samuel F. Antao
    • 2
  • Gheorghe-Teodor Bercea
    • 3
  • Carlo Bertolli
    • 2
  • Bronis R. de Supinski
    • 1
  • Erik W. Draeger
    • 1
  • Alexandre E. Eichenberger
    • 2
  • Jim Glosli
    • 1
  • Holger Jones
    • 1
  • Adam Kunen
    • 1
  • David Poliakoff
    • 1
  • David F. Richards
    • 1
  1. 1.Lawrence Livermore National LaboratoryLivermoreUSA
  2. 2.IBM T.J. Watson Research CenterYorktown HeightsUSA
  3. 3.Department of ComputingImperial College LondonLondonUK

Personalised recommendations