Skip to main content

Porting the MPI Parallelized LES Model PALM to Multi-GPU Systems – An Experience Report

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

Abstract

The computational power of graphics processing units (GPUs) and their availability on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to be executable on such hardware. This paper is a report on our experience of porting the MPI + OpenMP parallelized large-eddy simulation model (PALM) to a multi-GPU environment using the directive based high level programming paradigm OpenACC. PALM is a Fortran-based computational fluid dynamics software package, used for the simulation of atmospheric and oceanic boundary layers to answer questions linked to fundamental atmospheric turbulence research, urban climate, wind energy and cloud physics. Development on PALM started in 1997, the project currently entails 140 kLOC and is used on HPC farms of up to 43200 cores. The porting took place during the GPU Hackathon TU Dresden/Forschungszentrum Jülich in Dresden, Germany, in 2016. The main challenges we faced are the legacy code base of PALM and its size. We report the methods used to disentangle performance effects from logical code defects as well as our experiences with state-of-the-art profiling tools. We present detailed performance tests showing an overall performance on one GPU that can easily compete with up to ten CPU cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://docs.nvidia.com/cuda/profiler-users-guide/#command-line-profiler-control.

  2. 2.

    The C++ reference is available online at http://en.cppreference.com/w/.

References

  1. TOP500 Supercomputer Site. http://www.top500.org/list/2015/11/

  2. Adams, J.C., Brainerd, W.S., Martin, J.T., Smith, B.T., Wagener, J.L.: Fortran 95 Handbook: Complete ISO/ANSI Reference. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Doar, M.B.: Practical Development Environments. O’Reilly Media Inc., Sebastopol (2005)

    Google Scholar 

  4. Feathers, M.C.: Working effectively with legacy code. In: Zannier, C., Erdogmus, H., Lindstrom, L. (eds.) XP/Agile Universe 2004. LNCS, vol. 3134, p. 217. Springer, Heidelberg (2004). http://dx.doi.org/10.1007/978-3-540-27777-4_42

    Chapter  Google Scholar 

  5. Gronemeier, T., Inagaki, A., Gryschka, M., Kanda, M.: Large-eddy simulation of an urban canopy using a synthetic turbulence inflow generation method. JJSCE B1 71(4), I_43–I_48 (2015). http://dx.doi.org/10.2208/jscejhe.71.i_43

    Article  Google Scholar 

  6. Hoffmann, F., Raasch, S., Noh, Y.: Entrainment of aerosols and their activation in a shallow cumulus cloud studied with a coupled LCM-LES approach. Atmos. Res. 156, 43–57 (2015). http://dx.doi.org/10.1016/j.atmosres.2014.12.008

    Article  Google Scholar 

  7. Knigge, C., Raasch, S.: Improvement and development of one- and two-dimensional discrete gust models using a large-eddy simulation model. J. Wind Eng. Ind. Aerodyn. 153, 46–59 (2016). http://dx.doi.org/10.1016/j.jweia.2016.03.004

    Article  Google Scholar 

  8. Knigge, C., Auerswald, T., Raasch, S., Bange, J.: Comparison of two methods simulating highly resolved atmospheric turbulence data for study of stall effects. Comput. Fluids 108, 57–66 (2015). http://dx.doi.org/10.1016/j.compfluid.2014.11.005

    Article  Google Scholar 

  9. Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-68564-7_9

    Chapter  Google Scholar 

  10. Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-31476-6_7

    Chapter  Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  12. Letzel, M.O., Helmke, C., Ng, E., An, X., Lai, A., Raasch, S.: LES case study on pedestrian level ventilation in two neighbourhoods in Hong Kong. Meteorol. Z. 21(6), 575–589 (2012). http://dx.doi.org/10.1127/0941-2948/2012/0356

    Article  Google Scholar 

  13. Maronga, B., Gryschka, M., Heinze, R., Hoffmann, F., Kanani-Sühring, F., Keck, M., Ketelsen, K., Letzel, M.O., Sühring, M., Raasch, S.: The Parallelized Large-Eddy Simulation Model (PALM) version 4.0 for atmospheric and oceanic flows: model formulation recent developments, and future perspectives. Geosci. Model Dev. 8(8), 2515–2551 (2015). http://dx.doi.org/10.5194/gmd-8-2515-2015

    Article  Google Scholar 

  14. Maronga, B., Hartogensis, O.K., Raasch, S., Beyrich, F.: The effect of surface heterogeneity on the structure parameters of temperature and specific humidity: a large-eddy simulation case study for the LITFASS-2003 experiment. Bound. Layer Meteorol. 153(3), 441–470 (2014). http://dx.doi.org/10.1007/s10546-014-9955-x

    Article  Google Scholar 

  15. Martin, K., Hoffman, B.: Mastering CMake, 4th edn. Kitware Inc., New York (2008)

    Google Scholar 

  16. an Mey, D., Biersdorff, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Rössel, C., Saviankou, P., Schmidl, D., Shende, S.S., Wagner, M., Wesarg, B., Wolf, F.: Score-P: a unified performance measurement system for petascale applications. In: Bischof, C., Hegering, H.-G., Nagel, W.E., Wittum, G. (eds.) Competence in High Performance Computing 2010, pp. 85–97. Springer, Heidelberg (2012). http://www.springerlink.com/content/t041605372024474/?MUD=MP

    Google Scholar 

  17. OpenACC-Standard.org: The OpenACC Application Programming Interface, 2.5 edn. (2015). http://www.openacc.org/sites/default/files/OpenACC_2pt5.pdf

  18. Páll, S., Abraham, M.J., Kutzner, C., Hess, B., Lindahl, E.: Tackling exascale software challenges in molecular dynamics simulations with GROMACS. In: Markidis, S., Laure, E. (eds.) EASC 2014. LNCS, vol. 8759, pp. 3–27. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-15976-8_1

    Google Scholar 

  19. PGI: PGI CUDA Fortran Compiler. http://www.pgroup.com/resources/cudafortran.htm

  20. Reid, J.: The new features of Fortran 2003. SIGPLAN Fortran Forum 26(1), 10–33 (2007). http://dx.doi.org/10.1145/1243413.1243415

    Article  Google Scholar 

  21. Stallman, R.M., McGrath, R., Smith, P.D.: GNU make: a program for directing recompilation, for version 3.81. Free Software Foundation (2004)

    Google Scholar 

Download references

Acknowledgments

We would like to thank the Oak Ridge National Laboratory (US), Nvidia Corporation Inc. (US), the Portland Group Inc. (US), the standards OpenACC committee as well as the Center for Information Services and High Performance Computing (ZIH) at Technische Universität Dresden and the Forschungszentrum Jülich for organizing the OpenACC Hackathon in March 2016. We would like to thank personally Fernanda Foertter, Guido Juckeland and Dirk Pleiter for organizing the Hackathon in Dresden. Further, we express our deep gratitude to Dave Norton (Portland Group) and Alexander Grund (HZDR; Rossendorf) for their instrumental contribution as members of the mentoring team during the Hackathon. The author team consists of three PALM developers (Knoop, Gronemeier, and Knigge) and one mentor of the Hackathon (Steinbach).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helge Knoop .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Knoop, H., Gronemeier, T., Knigge, C., Steinbach, P. (2016). Porting the MPI Parallelized LES Model PALM to Multi-GPU Systems – An Experience Report. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics