Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10732)


In this work we use the GPU porting task for the operative Japanese weather prediction model “ASUCA” as an opportunity to examine productivity issues with OpenACC when applied to structured grid problems. We then propose “Hybrid Fortran”, an approach that combines the advantages of directive based methods (no rewrite of existing code necessary) with that of stencil DSLs (memory layout is abstracted). This gives the ability to define multiple parallelizations with different granularities in the same code. Without compromising on performance, this approach enables a major reduction in the code changes required to achieve a hybrid GPU/CPU parallelization - as demonstrated with our ASUCA implementation using Hybrid Fortran.


HPC OpenACC CUDA GPGPU OpenMP Atmospheric Weather Parallel programming Granularity Memory layout 



This work has been supported by the Japan Science and Technology Agency (JST) Core Research of Evolutional Science and Technology (CREST) research program “Highly Productive, High Performance Application Frameworks for Post Peta-scale Computing”, by KAKENHI Grant-in-Aid for Scientific Research (S) 26220002 from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, by “Joint Usage/Research Center” for Interdisciplinary Large-scale Information Infrastructures (JHPCN)” and “High Performance Computing Infrastructure (HPCI)” as well as by the “Advanced Computation and I/O Methods for Earth-System Simulations” (AIMES) project running under the German-Japanese priority program “Software for Exascale Computing” (SPPEXA). The authors thank the Japan Meteorological Agency for their extensive support, Tokyo University and the Global Scientific Information and Computing Center at Tokyo Institute of Technology for the use of their supercomputers Reedbush-H and TSUBAME 2.5.


  1. 1.
    Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013)Google Scholar
  2. 2.
    Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. 10, 21–40 (2000)MathSciNetMATHGoogle Scholar
  3. 3.
    Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009)Google Scholar
  4. 4.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-specific languages and high-level frameworks for high-performance computingCrossRefGoogle Scholar
  5. 5.
    Fuhrer, O.: Grid tools: towards a library for hardware oblivious implementation of stencil based codes (2014). Accessed 13 July 2017
  6. 6.
    Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Cumming, B., Bianco, M., Arteaga, A., Schulthess, T.C.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Front. Innovations 1(1), 45–62 (2014)Google Scholar
  7. 7.
    Govett, M., Middlecoff, J., Henderson, T.: Directive-based parallelization of the NIM weather model for GPUs. In: 2014 First Workshop on Accelerator Programming using Directives (WACCPD), pp. 55–61. IEEE (2014)Google Scholar
  8. 8.
    Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald, A., Wang, N., Madden, P., Schramm, J., Duarte, A.: Parallelization and performance of the NIM weather model on CPU, GPU and MIC processors. Bulletin of the American Meteorological Society (2017)Google Scholar
  9. 9.
    Gysi, T., Hoefler, T.: Integrating STELLA & MODESTO: definition and optimization of complex stencil programs (2017)Google Scholar
  10. 10.
    Ishida, J., Muroi, C., Kawano, K., Kitamura, Y.: Development of a new nonhydrostatic model ASUCA at JMA. CAS/JSC WGNE Res. Activities Atmos. Oceanic Model. 40, 0511–0512 (2010)Google Scholar
  11. 11.
    Jumah, N., Kunkel, J., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, Y.: GGDML: icosahedral models language extensions (2017)Google Scholar
  12. 12.
    Kwiatkowski, J.: Evaluation of parallel programs by measurement of its granularity. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 145–153. Springer, Heidelberg (2002). CrossRefGoogle Scholar
  13. 13.
    Lapillonne, X., Fuhrer, O.: Using compiler directives to port large scientific applications to GPUs: an example from atmospheric science. Parallel Process. Lett. 24(01), 1450003 (2014)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Mielikainen, J., Huang, B., Huang, A.: Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme. In: SPIE Sensing Technology + Applications, p. 91240T. International Society for Optics and Photonics (2014)Google Scholar
  15. 15.
    Müller, M., Aoki, T.: New high performance GPGPU code transformation framework applied to large production weather prediction code (2017, to be published in ACM TOPC)Google Scholar
  16. 16.
    Norman, M.R., Mametjanov, A., Taylor, M.: Exascale programming approaches for the accelerated model for climate and energy (2017)Google Scholar
  17. 17.
    Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)CrossRefGoogle Scholar
  18. 18.
    Sakamoto, M., Ishida, J., Kawano, K., Matsubayashi, K., Aranami, K., Hara, T., Kusabiraki, H., Muroi, C., Kitamura, Y.: Development of yin-yang grid global model using a new dynamical core ASUCA (2014)Google Scholar
  19. 19.
    Sawyer, W., Zaengl, G., Linardakis, L.: Towards a multi-node OpenACC implementation of the ICON model. In: EGU General Assembly Conference Abstracts, vol. 16 (2014)Google Scholar
  20. 20.
    Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010)Google Scholar
  21. 21.
    Shimokawabe, T., Aoki, T., Onodera, N.: High-productivity framework on GPU-rich supercomputers for operational weather prediction code ASUCA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 251–261. IEEE Press, Piscataway (2014).
  22. 22.
    Torres, R., Linardakis, L., Kunkel, J., Ludwig, T.: ICON DSL: A domain-specific language for climate modeling. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO (2013)Google Scholar
  23. 23.
    Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 191–202. IEEE Press, Piscataway (2014).
  24. 24.
    Wicker, L.J., Skamarock, W.C.: Time-splitting methods for elastic models using forward time schemes. Mon. Weather Rev. 130(8), 2088–2097 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Tokyo Institute of TechnologyTokyoJapan

Personalised recommendations