Abstract
Modern servers provide different features for managing the amount of energy that is needed to execute a given work-load. In this article we focus on a new generation of GPU-accelerated servers with POWER8 processors. For different scientific applications, which have in common that they have been written for massively-parallel computers, we measure energy-to-solution for different system configurations. By combining earlier developed performance models and a simple power model, we derive an energy model that can help to optimise for energy efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Comparison with nvidia-smi indicates an overhead of roughly 40 W measured with idle system.
- 3.
Less than 15 % of the mean value in pathological cases.
- 4.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
- 5.
Trademark of IBM in USA and/or other countries.
- 6.
We also performed fits with \(\gamma \) as a free parameter, where we found \(\gamma \simeq 3\).
References
Abraham, M.J., Murtola, T., Schulz, R., Páll, S., Smith, J.C., Hess, B., Lindahl, E.: GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015)
Alonso, P., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S.: Modeling power and energy of the task-parallel Cholesky factorization on multicore processors. Comput. Sci. Res. Dev. 29(2), 105–112 (2012). doi:10.1007/s00450-012-0227-z
Baumeister, P.F., Hater, T., Kraus, J., Pleiter, D., Wahl, P.: A performance model for GPU-accelerated FDTD applications. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp. 185–193, December 2015
Beeby, J.: The density of electrons in a perfect or imperfect lattice. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 302(1468), 113–136 (1967). The Royal Society
Bilardi, G., Pietracaprina, A., Pucci, G., Schifano, F., Tripiccione, R.: The potential of on-chip multiprocessing for QCD machines. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 386–397. Springer, Heidelberg (2005). doi:10.1007/11602569_41
Bui, V., Norris, B., Huck, K., McInnes, L.C., Li, L., Hernandez, O., Chapman, B.: A component infrastructure for performance and power modeling of parallel scientific applications. In: Proceedings of the 2008 compFrame/HPC-GECO Workshop on Component Based High Performance, CBHPC 2008, pp. 6:1–6:11. (2008). http://doi.acm.org/10.1145/1456190.1456199
Cabrera, A., Almeida, F., Blanco, V., Giménez, D.: Analytical modeling of the energy consumption for the High Performance Linpack. In: 2013 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 343–350, February 2013
Caldeira, A.B., et al.: IBM Power System S824L technical overview and introduction (2014). http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/redp5139.html
David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C.: RAPL: memory power estimation and capping. In: 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), pp. 189–194, August 2010
Demmel, J., Gearhart, A.: Instrumenting linear algebra energy consumption via on-chip energy counters. Technical report, UCB/EECS-2012-168, EECS Department, University of California, Berkeley, June 2012. http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-168.html
Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.M.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: The 2nd International Conference on Cloud and Green Computing, November 2012
Eyerman, S., Eeckhout, L.: A counter architecture for online DVFS profitability estimation. IEEE Trans. Comput. 59(11), 1576–1583 (2010)
Feng, W.C., et al.: Green500 list, November 2015. http://green500.org
Flinn, J., Satyanarayanan, M.: PowerScope: a tool for profiling the energy usage of mobile applications. In: Proceedings of the Second IEEE Workshop on Mobile Computer Systems and Applications, WMCSA 1999, p. 2 (1999). http://dl.acm.org/citation.cfm?id=520551.837522
Floyd, M., et al.: Introducing the adaptive energy management features of the POWER7 chip. IEEE Micro 31(2), 60–75 (2011)
Freund, R.W., Nachtigal, N.: QMR: a quasi-minimal residual method for non-Hermitian linear systems. Numer. Math. 60(1), 315–339 (1991)
Friedrich, J., Le, H., Starke, W., Stuechli, J., Sinharoy, B., Fluhr, E., Dreps, D., Zyuban, V., Still, G., Gonzalez, C., Hogenmiller, D., Malgioglio, F., Nett, R., Puri, R., Restle, P., Shan, D., Deniz, Z., Wendel, D., Ziegler, M., Victor, D.: The POWER8\(^{\rm {TM}}\) processor: designed for big data, analytics, and cloud environments. In: 2014 IEEE International Conference on IC Design Technology (ICICDT), pp. 1–4, May 2014
Ge, R., Feng, X., Song, S., Chang, H.C., Li, D., Cameron, K.W.: PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21(5), 658–671 (2010)
Ghosh, S., Chandrasekaran, S., Chapman, B.: Statistical modeling of power/energy of scientific kernels on a multi-GPU system. In: 2013 International Green Computing Conference (IGCC), pp. 1–6, June 2013
Hackenberg, D., Ilsche, T., Schuchart, J., Schöne, R., Nagel, W.E., Simon, M., Georgiou, Y.: HDEEM: high definition energy efficiency monitoring. In: Energy Efficient Supercomputing Workshop, E2SC 2014, pp. 1–10, November 2014
Isci, C., Martonosi, M.: Runtime power monitoring in high-end processors: methodology and empirical data. In: 2003 Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, pp. 93–104, December 2003
Klavík, P., Malossi, A.C.I., Bekas, C., Curioni, A.: Changing computing paradigms towards power efficiency. Philos. Trans. R. Soc. Lond. A: Math. Phys. Eng. Sci. 372(2018), 20130278 (2014)
Knobloch, M., Foszczynski, M., Homberg, W., Pleiter, D., Böttiger, H.: Mapping fine-grained power measurements to HPC application runtime characteristics on IBM POWER7. Comput. Sci. Res. Dev. 29(3), 211–219 (2013). doi:10.1007/s00450-013-0245-5
Kohn, W., Rostoker, N.: Solution of the Schrödinger equation in periodic lattices with an application to metallic Lithium. Phys. Rev. 94, 1111–1120 (1954)
Korringa, J.: On the calculation of the energy of a Bloch wave in a metal. Physica 13(6), 392–400 (1947)
Kraus, J.: Increase performance with GPU boost and K80 autoboost (2014). https://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/
Lee, K.H., Ahmed, I., Goh, R.S., Khoo, E.H., Li, E.P., Hung, T.G.: Implementation of the FDTD method based on Lorentz-Drude dispersive model on GPU for plasmonics applications. Prog. Electromagnet. Res. 116, 441–456 (2011)
Lefurgy, C., Wang, X., Ware, M.: Server-level power control. In: 2007 Fourth International Conference on Autonomic Computing, ICAC 2007, p. 4, June 2007
Lindahl, E.: Molecular simulation with GROMACS on CUDA GPUs (2013). http://on-demand.gputechconf.com/gtc/2013/webinar/gromacs-kepler-gpus-gtc-express-webinar.pdf
Pleiter, D.: Parallel computer architectures. In: 45th IFF Spring School 2014 “Computing Solids Models, ab-initio methods and supercomputing”. Schriften des Forschungszentrums Jülich, Reihe Schlüsseltechnologien, vol. 74 (2014)
Rountree, B., Lowenthal, D.K., Schulz, M., de Supinski, B.R.: Practical performance prediction under dynamic voltage frequency scaling. In: 2011 International Green Computing Conference and Workshops (IGCC), pp. 1–8, July 2011
Ryffel, S.: LEA\(^2\)P: the Linux energy attribution and accounting platform. Master’s thesis, Swiss Federal Institute of Technology (ETH) (2009). http://ftp.tik.ee.ethz.ch/pub/students/2009-FS/MA-2009-04.pdf
Shahmansouri, A., Rashidian, B.: GPU implementation of split-field finite-difference time-domain method for Drude-Lorentz dispersive media. Prog. Electromagnet. Res. 125, 55–77 (2012)
Song, S., Su, C., Rountree, B., Cameron, K.W.: A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 673–686, May 2013
Song, S.L., Barker, K., Kerbyson, D.: Unified performance and power modeling of scientific workloads. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC 2013, pp. 4:1–4:8. (2013). http://doi.acm.org/10.1145/2536430.2536435
Subramaniam, B., Feng, W.C.: Statistical power and performance modeling for optimizing the energy efficiency of scientific computing. In: Green Computing and Communications (GreenCom), pp. 139–146, December 2010
Taflove, A., Hagness, S.C.: Others: Computational Electrodynamics: The Finite-Difference Time-Domain Method. Artech House, Norwood (1995)
Tan, L., Kothapalli, S., Chen, L., Hussaini, O., Bissiri, R., Chen, Z.: A survey of power and energy efficient techniques for high performance numerical linear algebra operations. Parallel Comput. 40(10), 559–573 (2014)
Thiess, A., et al.: Massively parallel density functional calculations for thousands of atoms: KKRnano. Phys. Rev. B 85, 235103 (2012)
Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of HPC kernels. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp. 990–998, May 2012
Wahl, P., Ly-Gagnon, D., Debaes, C., Miller, D., Thienpont, H.: B-CALM: an open-source GPU-based 3D-FDTD with multi-pole dispersion for plasmonics. In: 2011 11th International Conference on Numerical Simulation of Optoelectronic Devices (NUSOD), pp. 11–12, September 2011
Weaver, V.M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., Moore, S.: Measuring energy and power with PAPI. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 262–268, September 2012
Wittmann, M., Hager, G., Zeiser, T., Treibig, J., Wellein, G.: Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations. Concur. Comput. Pract. Exper. 28, 2295–2315 (2016). doi:10.1002/cpe.3489
Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015
Zyuban, V., Taylor, S.A., Christensen, B., Hall, A.R., Gonzalez, C.J., Friedrich, J., Clougherty, F., Tetzloff, J., Rao, R.: IBM POWER7+ design for higher frequency at fixed power. IBM J. Res. Dev. 57(6), 1:1–1:18 (2013)
Acknowledgements
This work has been carried out in the context of the POWER Acceleration and Design Center, a joined project between IBM, Forschungszentrum Jülich and NVIDIA. We acknowledge generous support from IBM by providing early access to GPU-accelerated POWER8 systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Hater, T., Anlauf, B., Baumeister, P., Bühler, M., Kraus, J., Pleiter, D. (2016). Exploring Energy Efficiency for GPU-Accelerated POWER Servers. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)