Performance Portability of Earth System Models with User-Controlled GGDML Code Translation
The increasing need for performance of earth system modeling and other scientific domains pushes the computing technologies in diverse architectural directions. The development of models needs technical expertise and skills of using tools that are able to exploit the hardware capabilities. The heterogeneity of architectures complicates the development and the maintainability of the models.
To improve the software development process of earth system models, we provide an approach that simplifies the code maintainability by fostering separation of concerns while providing performance portability. We propose the use of high-level language extensions that reflect scientific concepts. The scientists can use the programming language of their own choice to develop models, however, they can use the language extensions optionally wherever they need. The code translation is driven by configurations that are separated from the model source code. These configurations are prepared by scientific programmers to optimally use the machine’s features.
The main contribution of this paper is the demonstration of a user-controlled source-to-source translation technique of earth system models that are written with higher-level semantics. We discuss a flexible code translation technique that is driven by the users through a configuration input that is prepared especially to transform the code, and we use this technique to produce OpenMP or OpenACC enabled codes besides MPI to support multi-node configurations.
KeywordsDSL Meta-Compiler Earth system modeling Software development Performance portability
This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA) (GZ: LU 1353/11-1). We would like to thank NVIDIA who supported this work with allowing to run some tests on their PSG cluster, and the German Climate Computing-Center (DKRZ) where we also have run some tests on the Mistral supercomputer.
- 1.CSCS Claw. http://www.xcalablemp.org/download/workshop/4th/Valentin.pdf. Accessed 22 Dec 2017
- 2.CSCS GridTools. https://pasc17.pasc-conference.org/fileadmin/user_upload/pasc17/program/post144s2.pdf. Accessed 22 Dec 2017
- 4.DeVito, Z., et al.: Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 9. ACM (2011)Google Scholar
- 5.Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007), vol. 28 (2007)Google Scholar
- 7.Ford, R., et al.: Gung Ho: a code design for weather and climate prediction on exascale machines. In: Proceedings of the Exascale Applications and Software Conference (2013)Google Scholar
- 8.Gysi, T., Fuhrer, O., Osuna, C., Cumming, B., Schulthess, T.: Stella: a domain-specific embedded language for stencil codes on structured grids. In: EGU General Assembly Conference Abstracts, vol. 16 (2014)Google Scholar
- 9.MKL Intel. Intel Math Kernel Library (2007)Google Scholar
- 11.Maruyama, N., Sato, K., Nomura, T., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2011)Google Scholar
- 12.Mudalige, G.R., Giles, M.B., Reguly, I., Bertolli, C., Kelly, P.H.J.: Op2: an active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In: Innovative Parallel Computing (InPar), pp. 1–12. IEEE (2012)Google Scholar
- 13.Müller, M., Aoki, T.: Hybrid Fortran: high productivity GPU porting framework applied to Japanese weather prediction model. arXiv preprint arXiv:1710.08616 (2017)
- 14.Reguly, I.Z., Mudalige, G.R., Giles, M.B., Curran, D., McIntosh-Smith, S.: The OPS domain specific abstraction for multi-block structured grid computations. In: 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 58–67. IEEE (2014)Google Scholar
- 16.Torres, R., Linardakis, L., Kunkel, T.L.J., Ludwig, T.: ICON DSL: a domain-specific language for climate modeling. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colo (2013). http://sc13.supercomputing.org/sites/default/files/WorkshopsArchive/track139.html
- 17.Unat, D., Cai, X., Baden, S.B.: Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the International Conference on Supercomputing, pp. 214–224. ACM (2011)Google Scholar
- 18.Wang, L., Wu, W., Xu, Z., Xiao, J., Yang, Y.: BLASX: a high performance level-3 BLAS library for heterogeneous multi-GPU computing. In: Proceedings of the 2016 International Conference on Supercomputing, p. 20. ACM (2016)Google Scholar
- 19.Yount, C.: Vector folding: improving stencil performance via multi-dimensional SIMD-vector representation. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), pp. 865–870. IEEE (2015)Google Scholar
- 20.Yount, C.: Recipe: building and running YASK (yet another stencil kernel) on Intel® processors (2016). https://software.intel.com/en-us/articles/recipe-building-and-running-yask-yet-another-stencil-kernel-on-intel-processors. Accessed 22 Dec 2017