Skip to main content

Advertisement

Log in

Monitoring energy consumption with SIOX

Autonomous monitoring triggered by abnormal energy consumption

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

In the face of the growing complexity of HPC systems, their growing energy costs, and the increasing difficulty to run applications efficiently, a number of monitoring tools have been developed during the last years. SIOX  is one such endeavor, with a uniquely holistic approach: Not only does it aim to record a certain kind of data, but to make all relevant data available for analysis and optimization. Among other sources, this encompasses data from hardware energy counters and trace data from different hardware/software layers. However, not all data that can be recorded should be recorded. As such, SIOX  needs good heuristics to determine when and what data needs to be collected, and the energy consumption can provide an important signal about when the system is in a state that deserves closer attention. In this paper, we show that SIOX  can use Likwid to collect and report the energy consumption of applications, and present how this data can be visualized using SIOX’s web-interface. Furthermore, we outline how SIOX  can use this information to intelligently adjust the amount of data it collects, allowing it to reduce the monitoring overhead while still providing complete information about critical situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. These values have been queried from the model-specific registers (MSR) and verified.

References

  1. Barrachina S, Barreda M, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana-Ortí ES (2013) An integrated framework for power-performance analysis of parallel scientific workloads. In: ENERGY 2013, the third international conference on smart grids. Green Communications and IT Energy-aware Technologies, pp 114–119

  2. Byna S, Chen Y, Sun XH, Thakur R, Gropp W (2008) Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the conference on supercomputing. SC ’08IEEE Press, Piscataway, pp 1–12

  3. Carias CG, Hesse W, Navarrete C, Brehm M, Treibig J (2013) A flexible framework for energy and performance analysis. inSiDE J 11(2):60–63

  4. Carns PH, Harms K, Allcock WE, Bacon C, Lang S, Latham R, Ross RB (2011) Understanding and improving computational science storage access through continuous characterization. In: Proc. 2011 IEEE 27th symposium on mass storage systems and technologies (MSST)

  5. Gebser M, Grote T, Kaminski R, Schaub T (2011) Reactive answer set programming. Proceedings of the 11th international conference on logic programming and nonmonotonic reasoning, LPNMR’11. Springer, Berlin, pp 54–66

  6. Hackenberg D, Ilsche T, Schone R, Molka D, Schmidt M, Nagel WE (2013) Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS) 0, pp 194–204

  7. Hayes-Roth B, Washington R, Hewett R, Hewett M, Seiver A (1989) Intelligent monitoring and control. In: Proceedings of the 11th international joint conference on artificial intelligence, IJCAI, vol 1. Morgan Kaufmann Publishers Inc., San Francisco, pp 243–249

  8. Helmer S, Poulovassilis A, Xhafa F (2013) Reasoning in event-based distributed systems. Springer, Berlin

  9. Himura Y, Fukuda K, Cho K, Esaki H (2009) An automatic and dynamic parameter tuning of a statistic-based anomaly detection algorithm. In: Proceedings of the 2009 IEEE international conference on communications. ICC’09IEEE Press, Piscataway, pp 1003–1008

  10. Intel Corporation (2011) Intel 64 and IA-32 architectures software developer’s manual, vol 3a. http://download.intel.com/design/processor/manuals/253668.pdf

  11. Kind A, Stoecklin MP, Dimitropoulos XA (2009) Histogram-based traffic anomaly detection. IEEE Trans Netw Service Manage 6(2):110–121

    Article  Google Scholar 

  12. Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel W (2008) The Vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155

    Chapter  Google Scholar 

  13. Kunkel J, Zimmer M, Hübbe N, Aguilera A, Mickler H, Wang X, Chut A, Bönisch T, Lüttgau J, Michel R, Weging J (2014 - to-appear) The SIOX architecture—coupling automatic monitoring and optimization of parallel I/O. In: Supercomputing. Lecture notes in computer science, vol 8488. Springer, Berlin

  14. Madhyastha T, Reed D (2002) Learning to classify parallel input/output access patterns. IEEE Trans Parallel Distrib Syst 13(8):802–813

    Article  Google Scholar 

  15. Mordvinova O, Runz D, Kunkel J, Ludwig T (2010) I/O performance evaluation with Parabench - programmable I/O benchmark. Procedia Computer Science pp 2119–2128

  16. Ostrouchov G, Naughton T, Engelmann C, Vallee G, Scott S (2009) Nonparametric multivariate anomaly analysis in support of hpc resilience. In: E-Science Workshops, 2009 5th IEEE international conference, pp 80–85

  17. Rotem E, Naveh A, Ananthakrishnan A, Rajwan D, Weissmann E (2012) Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2):20–27

    Article  Google Scholar 

  18. Sabri L, Chibani A, Amirat Y, Zarri Gp (2011) Semantic reasoning framework to supervise and manage contexts and objects in pervasive computing environments. In: Proceedings of the 2011 IEEE workshops of international conference on advanced information networking and applications. WAINAIEEE Computer Society, Washington, DC, USA, pp 47–52

  19. Sandeep SR, Swapna M, Niranjan T, Susarla S, Nandi S (2008) CLUEBOX: a performance log analyzer for automated troubleshooting. In: Proceedings of the first USENIX conference on analysis of system logs, WASL’08. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1855886.1855887

  20. Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI/IO. Parallel Comput 28(1):83–105

    Article  MATH  Google Scholar 

  21. Treibig J, Hager G, Wellein G (2010) Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: 39th IEEE international conference on parallel processing workshops (ICPPW), pp 207–216

  22. Weaver V, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: 41st international conference on parallel processing workshops (ICPPW), pp 262–268

  23. Wiedemann MC, Kunkel J, Zimmer M, Ludwig T, Resch M, Bönisch T, Wang X, Chut A, Aguilera A, Nagel W, Kluge M, Mickler H (2012) Towards I/O analysis of HPC systems and a generic architecture to collect access patterns. Computer science research and development, pp 1–11

  24. Yin Y, Li J, He J, Sun XH, Thakur R (2013) Pattern-direct and layout-aware replication scheme for parallel I/O systems. In: 2013 IEEE 27th international symposium on parallel distributed processing (IPDPS), pp 345–356

  25. Zimmer M, Kunkel J, Ludwig T (2013) Towards self-optimization in HPC I/O. In: Kunkel JM, Ludwig T, Meuer HW (eds) Supercomputing. Lecture notes in computer science, vol 7905. Springer, Berlin, pp 422–434

Download references

Acknowledgments

We want to express our gratitude to the German Aerospace Center (DLR) as responsible project agency and to the Federal Ministry of Education and Research (BMBF) for the financial support under grant 01IH11008 A-C.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian M. Kunkel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kunkel, J.M., Aguilera, A., Hübbe, N. et al. Monitoring energy consumption with SIOX. Comput Sci Res Dev 30, 125–133 (2015). https://doi.org/10.1007/s00450-014-0271-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-014-0271-y

Keywords

Navigation