Advertisement

SMCis: Scientific Applications Monitoring and Prediction for HPC Environments

  • Gabrieli SilvaEmail author
  • Vinícius Klôh
  • André Yokoyama
  • Matheus Gritz
  • Bruno Schulze
  • Mariza Ferro
Conference paper
  • 21 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1171)

Abstract

Understanding the computational requirements of scientific applications and their relation to power consumption is a fundamental task to overcome the current barriers to achieve the computational exascale. However, this imposes some challenging tasks, such as to monitor a wide range of parameters in heterogeneous environments, to enable fine grained profiling and power consumed across different components, to be language independent and to avoid code instrumentation. Considering these challenges, this work proposes the SMCis, an application monitoring tool developed with the goal of collecting all these aspects in an effective and accurate way, as well as to correlate these data graphically, with the environment of analysis and visualization. In addition, SMCis integrates and facilitates the use of Machine Learning tools for the development of predictive runtime and power consumption models.

Keywords

HPC Monitoring tools Energy Performance 

Notes

Acknowledgments

This work received financial support from the CNPQ, the EU Program H2020 and the MCTI/RNP-Brazil in the scope of project HPC4e, subsidy contract to No. 689772. Also from FAPERJ process number 26/202.500/2018 and CAPES.

References

  1. 1.
    High Performance Computing for Energy (HPC4E), August 2017. https://hpc4e.eu
  2. 2.
    Wattsup? pro, September 2017. http://www.wattsupmeters.com
  3. 3.
    Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. : Pract. Exper. 22(6), 685–701 (2010).  https://doi.org/10.1002/cpe.v22:6CrossRefGoogle Scholar
  4. 4.
    Alvin, K., et al.: On the path to exascale. Int. J. Distrib. Syst. Technol. 1(2), 1–22 (2010).  https://doi.org/10.4018/jdst.2010040101CrossRefGoogle Scholar
  5. 5.
    Ashby, S., et al.: The opportunities and challenges of exascale computing. Summary report of the advanced scientific computing advisory committee (ASCAC) subcommittee at the US Department of Energy Office of Science (2010)Google Scholar
  6. 6.
    Balladini, J., Morán, M., Rexachs del Rosario, D., et al.: Metodología para predecir el consumo energético de checkpoints en sistemas de hpc. In: XX Congreso Argentino de Ciencias de la Computación (Buenos Aires 2014) (2014)Google Scholar
  7. 7.
    Bedard, D., Fowler, R., Lim, M.Y., Porterfield, A.: PowerMon 2: fine-grained, integrated power measurement. Technical report TR-09-04, RENCI Technical Report (2009). http://renci.org/technical-reports/tr-09-04/
  8. 8.
    Bergman, K., et al.: Exascale computing study: technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical report 15 (2008)Google Scholar
  9. 9.
    Berral, J.L., Gavalda, R., Torres, J.: Power-aware multi-data center management using machine learning. In: 2013 42nd International Conference on Parallel Processing, pp. 858–867. IEEE (2013)Google Scholar
  10. 10.
    Bhimani, J., Mi, N., Leeser, M., Yang, Z.: FIM: performance prediction for parallel computation in iterative data processing applications. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 359–366. IEEE (2017)Google Scholar
  11. 11.
    Bridges, R.A., Imam, N., Mintz, T.M.: Understanding GPU power: a survey of profiling, modeling and simulation methods. ACM Comput. Surv. 49(3), 41:1–41:27 (2016).  https://doi.org/10.1145/2962131CrossRefGoogle Scholar
  12. 12.
    Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)Google Scholar
  13. 13.
    Ferro, M., Nicolás, M.F., del Rosario, Q., Saji, G., Mury, A.R., Schulze, B.: Leveraging high performance computing for bioinformatics: a methodology that enables a reliable decision-making. In: 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colômbia, 16–19 May 2016, pp. 684–692. IEEE Computer Society (2016).  https://doi.org/10.1109/CCGrid.2016.69
  14. 14.
    Ferro, M., Silva, G.D., Klóh, V.P., Schulze, B.: Challenges in HPC Evaluation: Towards a Methodology for Scientific Applications’ Requirements. IOS Press, Amsterdam (2017, accepted to publish)Google Scholar
  15. 15.
    Ge, R., Li, D., Chang, H.C., Cameron, K.W., Feng, X., Song, S.: PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21, 658–671 (2009). https://doi.org/doi.ieeecomputersociety.org/10.1109/TPDS.2009.76CrossRefGoogle Scholar
  16. 16.
    Guthrie, M.: Instant Nagios Starter. Packt Publishing (2013)Google Scholar
  17. 17.
    Ibeid, H., Meng, S., Dobon, O., Olson, L., Gropp, W.: Learning with analytical models. In: 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, pp. 778–786. IEEE Computer Society (2019)Google Scholar
  18. 18.
    Jaiantilal, A., Jiang, Y., Mishra, S.: Modeling CPU energy consumption for energy efficient scheduling. In: Proceedings of the 1st Workshop on Green Computing, pp. 10–15. ACM (2010)Google Scholar
  19. 19.
    Klôh, V.P., Ferro, M., Silva, G.D., Schulze, B.: Performance monitoring using nagios core. Relatórios de Pesquisa e Desenvolvimento do LNCC 03/2016, Laboratório Nacional de Computação Científica, Petropolis - RJ (2016). www.lncc.br
  20. 20.
    Kogge, P., et al.: Exascale computing study: technology challenges in achieving exascale systems. Technical report, DARPA IPTO, Air Force Research Labs, September 2008Google Scholar
  21. 21.
    Labasan, S.: Energy-efficient and power-constrained techniques for exascale computing (2016). Oral Comprehensive Exam. http://www.cs.uoregon.edu/Reports/ORAL-201610-Labasan.pdf. Accessed 18 May 2017
  22. 22.
    Ll Berral, J., Gavaldà, R., Torres, J.: Empowering automatic data-center management with machine learning. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 170–172. ACM (2013)Google Scholar
  23. 23.
    Martínez, V., Dupros, F., Castro, M., Navaux, P.: Performance improvement of stencil computations for multi-core architectures based on machine learning. Procedia Comput. Sci. 108, 305–314 (2017)CrossRefGoogle Scholar
  24. 24.
    Messina, P.: The exascale computing project. Comput. Sci. Eng. 19(3), 63–67 (2017).  https://doi.org/10.1109/MCSE.2017.57CrossRefGoogle Scholar
  25. 25.
    Nagios Team: Nagios Core Documentation (2016). https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/
  26. 26.
  27. 27.
    Patterson, D.: Orgins and Vision of the UC Berkeley Parallel Computing Laboratory, chap. 1, 1 edn, pp. 11–42. Microsoft Corporation (2013). http://books.google.com.br/books?id=2mJxngEACAAJ
  28. 28.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Rajovic, N., Carpenter, P.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 40:1–40:12. ACM, New York (2013).  https://doi.org/10.1145/2503210.2503281
  30. 30.
    Reed, D.A., Aydt, R.A., Madhyastha, T.M., Noe, R.J., Shields, K.A., Schwartz, B.W.: An overview of the Pablo performance analysis environment. Department of Computer Science, University of Illinois 1304 (1992)Google Scholar
  31. 31.
    Reinders, J.: VTune performance analyzer essentials (2005). http://nacad.ufrj.br/online/intel/vtune/Essentials_Excerpts.pdf
  32. 32.
    Rodola, G.: psutil documentation (2018). https://media.readthedocs.org/pdf/psutil/latest/psutil.pdf
  33. 33.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006).  https://doi.org/10.1177/1094342006064482CrossRefGoogle Scholar
  34. 34.
    Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 284–294. ACM (2015)Google Scholar
  35. 35.
    Wang, Q., Chu, X.: GPGPU power estimation with core and memory frequency scaling. SIGMETRICS Perform. Eval. Rev. 45(2), 73–78 (2017).  https://doi.org/10.1145/3152042.3152066CrossRefGoogle Scholar
  36. 36.
    Wu, X., Taylor, V., Cook, J., Mucci, P.J.: Using performance-power modeling to improve energy efficiency of hpc applications. Computer 49(10), 20–29 (2016)CrossRefGoogle Scholar
  37. 37.
    Zomaya, A.Y., Lee, Y.C.: Energy Efficient Distributed Computing Systems, 1st edn. Wiley-IEEE Computer Society Press (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.National Laboratory for Scientific Computing (LNCC)PetrópolisBrazil

Personalised recommendations