How good is the OpenPOWER architecture for high-performance CPU-oriented weather forecasting applications?

  • R. MorenoEmail author
  • E. Arias
  • A. Navarro
  • F. J. Tapiador


Performance, i.e., execution times, is one of the most important features of HPC software, but energy consumption is also growing in importance if we intend to extend application to Exascale. This is the case of HPC software used in weather forecasting, in which every ounce of performance is critical in order to increase the accuracy and precision of its results. In this work, we study the performance-energy balance of an OpenPOWER processor, which is designed for the high workloads typically seen on data servers and HPC environments. Our results show that the OpenPOWER processor is superior in performance in weather forecast workloads compared to other processors commonly used in HPC, but at the expense of consuming more energy. Furthermore, the highest hyperthreading modes available on OpenPOWER processors do not perform well with HPC workloads and are even detrimental to performance.


OpenPOWER Performance Energy consumption Compilers Weather research and forecasting 



Funding from projects CGL2013-48367-P and CGL2016-80609-R (Spanish Ministry of Economy and Competitiveness, Science and Innovation) is gratefully acknowledged. RM acknowledges an FPI grant EEBB-I-17-12253. AN acknowledges Grant FPU13/02798


  1. 1.
    Adinetz AV, Baumeister PF, Böttiger H, Hater T, Maurer T, Pleiter D, Schenck W, Schifano SF (2014) Performance evaluation of scientific applications on POWER8. In: International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Springer, pp 24–45Google Scholar
  2. 2.
    Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J et al. (2008) Exascale computing study: technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical Report, p 15Google Scholar
  3. 3.
    Bermejo B, Juiz C, Guerrero C (2018) On the linearity of performance and energy at virtual machine consolidation: the cis2 index for cpu workload in server saturation. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp 928–933Google Scholar
  4. 4.
    Bermejo B, Juiz C, Guerrero C (2018) Virtualization and consolidation: a systematic review of the past 10 years of research on energy and performance. J Supercomput 75:1–29Google Scholar
  5. 5.
    Daniels MH, Lundquist KA, Mirocha JD, Wiersema DJ, Chow FK (2016) A new vertical grid nesting capability in the weather research and forecasting (wrf) model. Mon Weather Rev 144(10):3725–3747CrossRefGoogle Scholar
  6. 6.
    Davidović Davor, Skala Karolj, Belušić Danijel, Prtenjak Maja Telišman (2010) Grid implementation of the weather research and forecasting model. Earth Sci Inf 3(4):199–208CrossRefGoogle Scholar
  7. 7.
    Denham Mónica, Lamperti Enrico, Areta Javier (2018) Weather radar data processing on graphic cards. J Supercomput 74(2):868–885CrossRefGoogle Scholar
  8. 8.
    Farguell A, Cortés A, Margalef T, Miró JR, Mercader J (2018) Scalability of a multi-physics system for forest fire spread prediction in multi-core platforms. J Supercomput. Google Scholar
  9. 9.
    Feliu Josue, Eyerman Stijn, Sahuquillo Julio, Petit Salvador, Eeckhout Lieven (2017) Improving IBM POWER8 performance through symbiotic job scheduling. IEEE Trans Parallel Distrib Syst 28(10):2838–2851CrossRefGoogle Scholar
  10. 10.
    Fernández-Quiruelas V, Blanco C, Cofiño Antonio S, Fernández J (2015) Large-scale climate simulations harnessing clusters, grid and cloud infrastructures. Future Gener Comput Syst 51:36–44CrossRefGoogle Scholar
  11. 11.
    Freeh Vincent W, Lowenthal David K, Feng Pan, Nandini Kappiah, Rob Springer, Rountree Barry L, Femal Mark E (2007) Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans Parallel DistribSyst 18(6):835–848CrossRefGoogle Scholar
  12. 12.
    Goel B, Titos-Gil R, Negi R, McKee SA, Stenstrom P (2014) Performance and energy analysis of the restricted transactional memory implementation on haswell. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp 615–624Google Scholar
  13. 13.
    Jeffers James, Reinders James, Sodani Avinash (2016) Intel Xeon phi processor high performance programming Knights landing edition. Morgan Kaufmann, BurlingtonGoogle Scholar
  14. 14.
    Jin Haoqiang, Jespersen Dennis, Mehrotra Piyush, Biswas Rupak, Huang Lei, Chapman Barbara (2011) High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput 37(9):562–575CrossRefGoogle Scholar
  15. 15.
    Jones Robert W (1977) A nested grid for a three-dimensional model of a tropical cyclone. J Atmos Sci 34(10):1528–1553CrossRefGoogle Scholar
  16. 16.
    Kaliszan D, Fürst S, Gienger M, Gogolenko S, Meyer N, Petruczynik S (2019) Comparative benchmarking of HPC systems for GSS applications: GSS applications in the HPC ecosystem. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM, pp 43–52Google Scholar
  17. 17.
    Kim R, Choi J, Lee M (2019) Optimizing parallel GEMM routines using auto-tuning with intel AVX-512. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM, pp 101–110Google Scholar
  18. 18.
    Köhler M, Saak J (2018) Frequency scaling and energy efficiency regarding the Gauss–Jordan elimination scheme with application to the matrix-sign-function on OpenPOWER 8. Concur Comput Pract Exp 31:e4504CrossRefGoogle Scholar
  19. 19.
    Alexandros Labrinidis, Jagadish Hosagrahar V (2012) Challenges and opportunities with big data. Proc VLDB Endow 5(12):2032–2033CrossRefGoogle Scholar
  20. 20.
    Leng T, Ali R, Hsieh J, Mashayekhi V, Rooholamini R (2002) An empirical study of hyper-threading in high performance computing clusters. In: Proceedings of LCI International Conference on Linux Clusters: Linux HPC revolution, 45Google Scholar
  21. 21.
    Lu Xiaoyi, Shi Haiyang, Shankar Dipti, Panda Dhabaleswar K DK (2017) Performance characterization and acceleration of big data workloads on OpenPOWER system. In: 2017 IEEE International Conference on Big Data, pp 213–222Google Scholar
  22. 22.
    Mlawer Eli J, Taubman Steven J, Brown Patrick D, Iacono Michael J, Clough Shepard A (1997) Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. J Geophys Res Atmos 102(D14):16663–16682CrossRefGoogle Scholar
  23. 23.
    Hugh Morrison, Milbrandt Jason A (2015) Parameterization of cloud microphysics based on the prediction of bulk ice particle properties. part i: scheme description and idealized tests. J Atmos Sci 72(1):287–311CrossRefGoogle Scholar
  24. 24.
    Niu GY, Yang ZL, Mitchell KE, Chen F, Ek MB, Barlage M, Kumar A, Manning K, Niyogi D, Rosero E et al (2011) The community noah land surface model with multiparameterization options (Noah-MP): 1 model description and evaluation with local-scale measurements. J Geophys Res Atmos. Google Scholar
  25. 25.
    Park Jinsu, Baek Woongki (2019) Analyzing and optimizing the performance and energy efficiency of transactional scientific applications on large-scale NUMA systems with HTM support. J Parallel Distrib Comput 127:1–17CrossRefGoogle Scholar
  26. 26.
    Shainer G, Liu T, Lui P, Graham R (2011) Accelerating high performance computing applications through mpi offloading. HPC Advisory Council–HPC Scale Special Interest Group, Sunnyvale, CAGoogle Scholar
  27. 27.
    Pablo Silva Juan, José Hagopian, Marcel Burdiat, Ernesto Dufrechou, Martín Pedemonte, Alejandro Gutiérrez, Gabriel Cazes, Pablo Ezzatti (2014) Another step to the full GPU implementation of the weather research and forecasting model. J Supercomput 70(2):746–755CrossRefGoogle Scholar
  28. 28.
    Balaram Sinharoy, Van Norstrand JA, Eickemeyer Richard J, Le Hung Q, Jens Leenstra, Nguyen Dung Q, Konigsburg B, Ward K, Brown MD, Moreira José E et al (2015) IBM POWER8 processor core microarchitecture. IBM J Res Dev 59(1):1–2Google Scholar
  29. 29.
    Skamarock WC, Klemp JB, Dudhia J, Gill DO, Barker DM, Wang W, Powers JG (2005) A description of the advanced research wrf version 2. Technical report, National Center For Atmospheric Research Boulder Co Mesoscale and Microscale Meteorology DivGoogle Scholar
  30. 30.
    Sudheer CD, Srinivasan A (2015) Efficient barrier implementation on the POWER8 processor. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp 165–173Google Scholar
  31. 31.
    Wang Yuzhu, Jiang Jinrong, Zhang Junqiang, He Juanxiong, Zhang He, Chi Xuebin, Yue Tianxiang (2018) An efficient parallel algorithm for the coupling of global climate models and regional climate models on a large-scale multi-core cluster. J Supercomput 74(8):3999–4018CrossRefGoogle Scholar
  32. 32.
    Wei Y, Wang Y, Cai L, Tang W, Wang B, Ethier S, See S, Lin J (2016) Performance and portability studies with Open ACC accelerated version of GTC-P. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 13–18Google Scholar
  33. 33.
    Zenker E, Widera R, Huebl A, Juckeland G, Knüpfer A, Nagel WE, Bussmann M (2016) Performance-portable many-core plasma simulations: porting PIConGPU to OpenPOWER and beyond. In: International Conference on High Performance Computing, Springer, pp 293–301Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Castilla-La ManchaToledoSpain
  2. 2.University of Castilla-La ManchaAlbaceteSpain

Personalised recommendations