Abstract
Many application developers need code that runs efficiently on multiple architectures, but cannot afford to maintain architecturally specific codes. With the addition of target directives to support offload accelerators, OpenMP now has the machinery to support performance portable code development. In this paper, we describe application ports of Kripke, Cardioid, and LULESH to OpenMP 4.5 and discuss our successes and failures. Challenges encountered include how OpenMP interacts with C++ including classes with virtual methods and lambda functions. Also, the lack of deep copy support in OpenMP increased code complexity. Finally, GPUs inability to handle virtual function calls required code restructuring. Despite these challenges we demonstrate OpenMP obtains performance within 10 % of hand written CUDA for memory bandwidth bound kernels in LULESH. In addition, we show with a minor change to the OpenMP standard that register usage for OpenMP code can be reduced by up to 10 %.
The rights of this work are transferred to the extent transferable according to title 17 U.S.C. 105.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
firstprivate: Specifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct.
References
Openmp application programming interface, November 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf
Beckingsale, D.: Lightweight models for dynamically tuning data-dependent code, April 2016
Bercea, G.T., Bertolli, C., Antao, S.F., Jacob, A.C., Eichenberger, A.E., Chen, T., Sura, Z., Sung, H., Rokos, G., Appelhans, D., et al.: Performance analysis of openmp on a gpu using a coral proxy application. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, p. 2. ACM (2015)
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)
Draeger, E.W., Karlin, I., Scogland, T., Richards, D., Glosli, J., Jones, H., Poliakoff, D., Kunen, A.: Openmp 4.5 ibm november 2015 hackathon: current status and lessons learned. Technical report LLNL-TR-680824, Lawrence Livermore National Laboratory, January 2016
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. Innov. Parallel Comput. 2012, 1–10 (2012)
Hornung, R., Keasler, J.: The raja portability layer: overview and status. Technical report LLNL-TR-661403, Lawrence Livermore National Laboratory, September 2014
Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical report LLNL-TR-608824, December 2012
Kunen, A.J.: Tloops - raja-like transformations in kripke, February 2015
Kunen, A., Bailey, T., Brown, P.: Kripke-a massively parallel transport mini-app. Technical report LLNL-CONF-675389, Lawrence Livermore National Laboratory, April 2015
Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. IEEE Computer Society Press, November 2012
Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 1–10. ACM (2016)
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating openmp 4.0’s effectiveness as a heterogeneous parallel programming model. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW). IEEE, May 2016
Muralidharan, S., Garland, M., Catanzaro, B., Sidelnik, A., Hall, M.: A collection-oriented programming model for performance portability. ACM SIGPLAN Not. 50, 263–264 (2015). ACM
Pickering, B.P., Jackson, C.W., Scogland, T.R., Feng, W.C., Roy, C.J.: Directive-based GPU programming for computational fluid dynamics. Comput. Fluids 114, 242–253 (2015). http://www.sciencedirect.com/science/article/pii/S004579301500081X
Richards, D.F., Glosli, J.N., Draeger, E.W., Mirin, A.A., Chan, B., Fattebert, J., Krauss, W.D., Oppelstrup, T., Butler, C.J., Gunnels, J.A., et al.: Towards real-time simulation of cardiac electrophysiology in a human heart at high resolution. Comput. Meth. Biomech. Biomed. Eng. 16(7), 802–805 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Karlin, I. et al. (2016). Early Experiences Porting Three Applications to OpenMP 4.5. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-45550-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)