Power Efficiency Analysis of a Deep Learning Workload on an IBM “Minsky” Platform

  • Mauricio D. Mazuecos Pérez
  • Nahuel G. Seiler
  • Carlos Sergio Bederián
  • Nicolás WolovickEmail author
  • Augusto J. Vega
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 979)


The rise of Deep Learning techniques has attracted special attention to GPUs usage for better performance of model computation. Most frameworks for Cognitive Computing include support to offload model training and inferencing to graphics hardware, and this is so common that GPU designers are reserving die area for special function units tailored to accelerating Deep Learning computation. Measuring the capability of a hardware platform to run these workloads is a major concern for vendors and consumers of this exponentially growing market. In a previous work [9] we analyzed the execution times of the Fathom AI workloads [2] in CPUs and CPUs+GPUs. In this work we measure the Fathom workloads in the POWER8-based “Minsky” [15] platform, profiling power consumption and energy efficiency in GPUs. We explore alternative forms of execution via GPU power and frequency capping with the aim of reducing Energy-to-Solution (ETS) and Energy-Delay-Product (EDP). We show important ETS savings of up to 27% with half of the workloads decreasing the EDP. We also expose the advantages of frequency capping with respect to power capping in NVIDIA GPUs.


Fathom GPU Power capping Frequency capping Energy-to-Solution Energy-Delay-Product 



This work was partially funded by SeCyT-UNC 2016 grant 30720150101248CB “Heterogeneous HPC” and 2016 IBM Faculty Award “Resilient Scale-Out for Deep Learning on Power Systems”.


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). software available:
  2. 2.
    Adolf, R., Rama, S., Reagen, B., Wei, G.Y., Brooks, D.M.: Fathom: reference workloads for modern deep learning methods. CoRR abs/1608.06581 (2016).
  3. 3.
    Adolf, B.: Fathom, reference workloads for modern deep learning.
  4. 4.
    Caldeira, A.B., Haug, V., Vetter, S.: IBM Power System S822LC for High Performance Computing Introduction and Technical Overview, 1st edn. IBM Redbooks, October 2016Google Scholar
  5. 5.
    Cho, M., Finkler, U., Kumar, S., Kung, D.S., Saxena, V., Sreedhar, D.: PowerAI DDL. CoRR abs/1708.02188 (2017)Google Scholar
  6. 6.
    Chollet, F.: Deep Learning with Python. Manning, Shelter Island (2018)Google Scholar
  7. 7.
    Deng, B., et al.: Extending Moore’s law via computationally error-tolerant computing. ACM Trans. Archit. Code Optim. 15(1), 8:1–8:27 (2018). Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  9. 9.
    Guignard, M., Schild, M., Bederián, C.S., Wolovick, N., Vega, A.J.: Performance characterization of state-of-the-art deep learning workloads on a “Minsky” platform. In: HICSS 2018 (2018)Google Scholar
  10. 10.
    Hannun, A.Y., et al.: Deep speech: scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014).
  11. 11.
    @hashcat: GPU power efficiency (Hash/Watt) explained simple (2017).
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).
  13. 13.
    Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
  15. 15.
    IBM Corporation: IBM POWER8 specification.
  16. 16.
    Kingma, D.P., Welling, M.: Stochastic gradient VB and the variational auto-encoder. In: Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014 (2014)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks, vol. 25, January 2012Google Scholar
  18. 18.
    LeCun, Y., Cortes, C., Burges, C.J.: The MNIST Dataset of Handwritten Digits (1999).
  19. 19.
    MLPerf: A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ml cloud platforms.
  20. 20.
    Mnih, V., et al.: Playing Atari with deep reinforcement learning. CoRR abs/1312.5602 (2013).
  21. 21.
    Seiler, N.G.: Changes to make seq2seq compatible with tensorflow versions later than 1.x (2017).
  22. 22.
    Mott, S.: Ethereum Mining with NVIDIA on Linux (2017).
  23. 23.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
  24. 24.
    Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: Weakly supervised memory networks. CoRR abs/1503.08895 (2015).
  25. 25.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112. Curran Associates, Inc. (2014).
  26. 26.
    TOP500: June 2018 List (2018).
  27. 27.
    Weston, J., Chopra, S., Bordes, A.: Memory networks. CoRR abs/1410.3916 (2014).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mauricio D. Mazuecos Pérez
    • 1
  • Nahuel G. Seiler
    • 1
  • Carlos Sergio Bederián
    • 1
    • 2
  • Nicolás Wolovick
    • 1
    Email author
  • Augusto J. Vega
    • 3
  1. 1.FaMAFUniversidad Nacional de CórdobaCórdobaArgentina
  2. 2.CONICETBuenos AiresArgentina
  3. 3.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations