Advertisement

Modeling and Evaluation of Application-Aware Dynamic Thermal Control in HPC Nodes

  • Daniele CesariniEmail author
  • Andrea Bartolini
  • Luca Benini
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 500)

Abstract

As side effects of the end of Dennard’s scaling, power and thermal technological walls stand in front of the evolution of supercomputers towards the exaflops era. Energy and temperature walls are big challenges to face for assuring a constant grow of performance in future. New generation architectures for HPC systems implement HW and SW components to address energy and thermal issues for increasing power and efficient computing in scientific workload. In thermal-bound HPC machines, workload-aware runtimes can leverage hardware knobs to guarantee the best operating point in term of performance and power saving without violating thermal constraints.

In this paper, we present an integer-linear programming formulation for job mapping and frequency selection for thermal-bound HPC nodes. We use a fast solver and workload traces extracted from a real supercomputer to test our methodology. Our runtime is integrated into the MPI library, and it is capable of assigning high-performance cores to performance-critical processes. Critical processes are identified at execution time through a mathematical formulation, which relies on the characterization of the application workload and on the global synchronization barriers. We demonstrate that by combining long and short horizon predictions with information on the critical processes retrieved from the programming model, we can drastically improve the performance of the target application w.r.t. state-of-the-art DTM solutions.

Keywords

HPC Thermal model Power model Workload model Energy saving Thermal constraint DTM MPI Runtime ILP Quantum ESPRESSO 

Notes

Acknowledgments

Work supported by the EU FETHPC project ANTAREX (g.a. 671623), EU project ExaNoDe (g.a. 671578), and EU ERC Project MULTITHERMAN (g.a. 291125).

References

  1. 1.
    TOP500 Supercomputer Sites (2017). Top500.org
  2. 2.
    Ayoub, R., Sharifi, S., Rosing, T.S.: GentleCool: cooling aware proactive workload scheduling in multi-machine systems. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 295–298. European Design and Automation Association (2010)Google Scholar
  3. 3.
    Bartolini, A., Cacciari, M., Cavazzoni, C., Tecchiolli, G., Benini, L.: Unveiling eurora - thermal and power characterization of the most energy-efficient supercomputer in the world. In: Proceedings of the Conference on Design, Automation & Test in Europe, DATE 2014, 3001, Leuven, Belgium, pp. 277:1–277:6. European Design and Automation Association (2014)Google Scholar
  4. 4.
    Bartolini, A., Cacciari, M., Tilli, A., Benini, L.: A distributed and self-calibrating model-predictive controller for energy and thermal management of high-performance multicores. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1–6, March 2011Google Scholar
  5. 5.
    Beneventi, F., Bartolini, A., Cavazzoni, C., Benini, L.: Cooling-aware node-level task allocation for next-generation green HPC systems. Management 1, 6 (2016)Google Scholar
  6. 6.
    Beneventi, F., Bartolini, A., Tilli, A., Benini, L.: An effective gray-box identification procedure for multicore thermal modeling. IEEE Trans. Comput. 63(5), 1097–1110 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cesarini, D., Bartolini, A., Benini, L.: Benefits in relaxing the power capping constraint. In: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy Efficient HPC Systems, ANDARE 2017, pp. 3:1–3:6. ACM, New York (2017)Google Scholar
  8. 8.
    Cesarini, D., Bartolini, A., Benini, L.: Prediction horizon vs. efficiency of optimal dynamic thermal control policies in HPC nodes. In: 2017 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6, October 2017Google Scholar
  9. 9.
    Conficoni, C., Bartolini, A., Tilli, A., Tecchiolli, G., Benini, L.: Energy-aware cooling for hot-water cooled supercomputers. In: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, San Jose, CA, USA, pp. 1353–1358. EDA Consortium (2015)Google Scholar
  10. 10.
    Coskun, A.K., Rosing, T.S., Gross, K.C.: Utilizing predictors for efficient thermal management in multiprocessor socs. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 28(10), 1503–1516 (2009)CrossRefGoogle Scholar
  11. 11.
    Coskun, A.K., Rosing, T.S., Whisnant, K.: Temperature aware task scheduling in MPSoCs. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1659–1664. EDA Consortium (2007)Google Scholar
  12. 12.
    Coşkun, A.K., Whisnant, K., Gross, K.C., et al.: Static and dynamic temperature-aware scheduling for multiprocessor SoCs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(9), 1127–1140 (2008)CrossRefGoogle Scholar
  13. 13.
    Eastep, J., et al.: Global extensible open power manager: a vehicle for HPC community collaboration toward co-designed energy management solutions (2016)Google Scholar
  14. 14.
    Freeh, V.W., Kappiah, N., Lowenthal, D.K., Bletsch, T.K.: Just-in-time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. J. Parallel Distrib. Comput. 68(9), 1175–1185 (2008)CrossRefGoogle Scholar
  15. 15.
    Ge, R., Feng, X., Feng, W.-C., Cameron, K.W.: CPU miser: a performance-directed, run-time system for power-aware clusters. In: 2007 International Conference on Parallel Processing (ICPP 2007), p. 18. IEEE (2007)Google Scholar
  16. 16.
    Giannozzi, P., et al.: Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys.: Condens. Matter 21(39), 395502 (2009)Google Scholar
  17. 17.
    Hammarlund, P., et al.: Haswell: the fourth-generation Intel core processor. IEEE Micro 2, 6–20 (2014)CrossRefGoogle Scholar
  18. 18.
    Hanumaiah, V., Vrudhula, S., Chatha, K.S.: Performance optimal speed control of multi-core processors under thermal constraints. In: Design, Automation Test in Europe Conference Exhibition, DATE 2009, pp. 1548–1551, April 2009Google Scholar
  19. 19.
    Huck, K.A., Labarta, J.: Detailed load balance analysis of large scale parallel applications. In: 2010 39th International Conference on Parallel Processing (ICPP), pp. 535–544. IEEE (2010)Google Scholar
  20. 20.
    Khdr, H., Pagani, S., Shafique, M., Henkel, J.: Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips. In: Proceedings of the 52nd Annual Design Automation Conference, p. 179. ACM (2015)Google Scholar
  21. 21.
    Khdr, H., et al.: Power density-aware resource management for heterogeneous tiled multicores. IEEE Trans. Comput. 66(3), 488–501 (2017)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Lim, M.Y., Freeh, V.W., Lowenthal, D.K.: Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: SC 2006 Conference, Proceedings of the ACM/IEEE, p. 14. IEEE (2006)Google Scholar
  23. 23.
    Maiterth, M., et al.: Power aware high performance computing: challenges and opportunities for application and system developers—survey tutorial. In: 2017 International Conference on High Performance Computing Simulation (HPCS), pp. 3–10, July 2017Google Scholar
  24. 24.
    Murali, S., Mutapcic, A., Atienza, D., Gupta, R., Boyd, S., Micheli, G.D.: Temperature-aware processor frequency assignment for MPSoCs using convex optimization. In: 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 111–116, September 2007Google Scholar
  25. 25.
    Pearce, O., Gamblin, T., de Supinski, B.R., Schulz, M., Amato, N.M.: Quantifying the effectiveness of load balance algorithms. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 185–194. ACM, New York (2012)Google Scholar
  26. 26.
    Puschini, D., Clermidy, F., Benoit, P., Sassatelli, G., Torres, L.: Temperature-aware distributed run-time optimization on MP-SoC using game theory. In: IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2008, pp. 375–380. IEEE (2008)Google Scholar
  27. 27.
    Rountree, B., Lownenthal, D.K., De Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing, pp. 460–469. ACM (2009)Google Scholar
  28. 28.
    Rudi, A., Bartolini, A., Lodi, A., Benini, L.: Optimum: thermal-aware task allocation for heterogeneous many-core devices. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 82–87, July 2014Google Scholar
  29. 29.
    Wang, Z., Bash, C., Tolia, N., Marwah, M., Zhu, X., Ranganathan, P.: Optimal fan speed control for thermal management of servers. In: ASME 2009 InterPACK Conference collocated with the ASME 2009 Summer Heat Transfer Conference and the ASME 2009 3rd International Conference on Energy Sustainability, pp. 709–719. American Society of Mechanical Engineers (2009)Google Scholar
  30. 30.
    Xie, Q., Dousti, M.J., Pedram, M.: Therminator: a thermal simulator for smartphones producing accurate chip and skin temperature maps. In: 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp. 117–122, August 2014Google Scholar
  31. 31.
    Xie, Y., Hung, W.-L.: Temperature-aware task allocation and scheduling for embedded multiprocessor systems-on-chip (MPSoC) design. J. VLSI Sig. Process. 45(3), 177–189 (2006)CrossRefGoogle Scholar
  32. 32.
    Zanini, F., Atienza, D., Benini, L., Micheli, G.D.: Thermal-aware system-level modeling and management for multi-processor systems-on-chip. In: 2011 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2481–2484, May 2011Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  • Daniele Cesarini
    • 1
    Email author
  • Andrea Bartolini
    • 1
  • Luca Benini
    • 1
    • 2
  1. 1.DEIUniversity of BolognaBolognaItaly
  2. 2.IISSwiss Federal Institute of TechnologyZurichSwitzerland

Personalised recommendations