Abstract
In the face of the growing complexity of HPC systems, their growing energy costs, and the increasing difficulty to run applications efficiently, a number of monitoring tools have been developed during the last years. SIOX is one such endeavor, with a uniquely holistic approach: Not only does it aim to record a certain kind of data, but to make all relevant data available for analysis and optimization. Among other sources, this encompasses data from hardware energy counters and trace data from different hardware/software layers. However, not all data that can be recorded should be recorded. As such, SIOX needs good heuristics to determine when and what data needs to be collected, and the energy consumption can provide an important signal about when the system is in a state that deserves closer attention. In this paper, we show that SIOX can use Likwid to collect and report the energy consumption of applications, and present how this data can be visualized using SIOX’s web-interface. Furthermore, we outline how SIOX can use this information to intelligently adjust the amount of data it collects, allowing it to reduce the monitoring overhead while still providing complete information about critical situations.
Similar content being viewed by others
Notes
These values have been queried from the model-specific registers (MSR) and verified.
References
Barrachina S, Barreda M, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana-Ortí ES (2013) An integrated framework for power-performance analysis of parallel scientific workloads. In: ENERGY 2013, the third international conference on smart grids. Green Communications and IT Energy-aware Technologies, pp 114–119
Byna S, Chen Y, Sun XH, Thakur R, Gropp W (2008) Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the conference on supercomputing. SC ’08IEEE Press, Piscataway, pp 1–12
Carias CG, Hesse W, Navarrete C, Brehm M, Treibig J (2013) A flexible framework for energy and performance analysis. inSiDE J 11(2):60–63
Carns PH, Harms K, Allcock WE, Bacon C, Lang S, Latham R, Ross RB (2011) Understanding and improving computational science storage access through continuous characterization. In: Proc. 2011 IEEE 27th symposium on mass storage systems and technologies (MSST)
Gebser M, Grote T, Kaminski R, Schaub T (2011) Reactive answer set programming. Proceedings of the 11th international conference on logic programming and nonmonotonic reasoning, LPNMR’11. Springer, Berlin, pp 54–66
Hackenberg D, Ilsche T, Schone R, Molka D, Schmidt M, Nagel WE (2013) Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS) 0, pp 194–204
Hayes-Roth B, Washington R, Hewett R, Hewett M, Seiver A (1989) Intelligent monitoring and control. In: Proceedings of the 11th international joint conference on artificial intelligence, IJCAI, vol 1. Morgan Kaufmann Publishers Inc., San Francisco, pp 243–249
Helmer S, Poulovassilis A, Xhafa F (2013) Reasoning in event-based distributed systems. Springer, Berlin
Himura Y, Fukuda K, Cho K, Esaki H (2009) An automatic and dynamic parameter tuning of a statistic-based anomaly detection algorithm. In: Proceedings of the 2009 IEEE international conference on communications. ICC’09IEEE Press, Piscataway, pp 1003–1008
Intel Corporation (2011) Intel 64 and IA-32 architectures software developer’s manual, vol 3a. http://download.intel.com/design/processor/manuals/253668.pdf
Kind A, Stoecklin MP, Dimitropoulos XA (2009) Histogram-based traffic anomaly detection. IEEE Trans Netw Service Manage 6(2):110–121
Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel W (2008) The Vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155
Kunkel J, Zimmer M, Hübbe N, Aguilera A, Mickler H, Wang X, Chut A, Bönisch T, Lüttgau J, Michel R, Weging J (2014 - to-appear) The SIOX architecture—coupling automatic monitoring and optimization of parallel I/O. In: Supercomputing. Lecture notes in computer science, vol 8488. Springer, Berlin
Madhyastha T, Reed D (2002) Learning to classify parallel input/output access patterns. IEEE Trans Parallel Distrib Syst 13(8):802–813
Mordvinova O, Runz D, Kunkel J, Ludwig T (2010) I/O performance evaluation with Parabench - programmable I/O benchmark. Procedia Computer Science pp 2119–2128
Ostrouchov G, Naughton T, Engelmann C, Vallee G, Scott S (2009) Nonparametric multivariate anomaly analysis in support of hpc resilience. In: E-Science Workshops, 2009 5th IEEE international conference, pp 80–85
Rotem E, Naveh A, Ananthakrishnan A, Rajwan D, Weissmann E (2012) Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2):20–27
Sabri L, Chibani A, Amirat Y, Zarri Gp (2011) Semantic reasoning framework to supervise and manage contexts and objects in pervasive computing environments. In: Proceedings of the 2011 IEEE workshops of international conference on advanced information networking and applications. WAINAIEEE Computer Society, Washington, DC, USA, pp 47–52
Sandeep SR, Swapna M, Niranjan T, Susarla S, Nandi S (2008) CLUEBOX: a performance log analyzer for automated troubleshooting. In: Proceedings of the first USENIX conference on analysis of system logs, WASL’08. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1855886.1855887
Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI/IO. Parallel Comput 28(1):83–105
Treibig J, Hager G, Wellein G (2010) Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: 39th IEEE international conference on parallel processing workshops (ICPPW), pp 207–216
Weaver V, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: 41st international conference on parallel processing workshops (ICPPW), pp 262–268
Wiedemann MC, Kunkel J, Zimmer M, Ludwig T, Resch M, Bönisch T, Wang X, Chut A, Aguilera A, Nagel W, Kluge M, Mickler H (2012) Towards I/O analysis of HPC systems and a generic architecture to collect access patterns. Computer science research and development, pp 1–11
Yin Y, Li J, He J, Sun XH, Thakur R (2013) Pattern-direct and layout-aware replication scheme for parallel I/O systems. In: 2013 IEEE 27th international symposium on parallel distributed processing (IPDPS), pp 345–356
Zimmer M, Kunkel J, Ludwig T (2013) Towards self-optimization in HPC I/O. In: Kunkel JM, Ludwig T, Meuer HW (eds) Supercomputing. Lecture notes in computer science, vol 7905. Springer, Berlin, pp 422–434
Acknowledgments
We want to express our gratitude to the German Aerospace Center (DLR) as responsible project agency and to the Federal Ministry of Education and Research (BMBF) for the financial support under grant 01IH11008 A-C.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kunkel, J.M., Aguilera, A., Hübbe, N. et al. Monitoring energy consumption with SIOX. Comput Sci Res Dev 30, 125–133 (2015). https://doi.org/10.1007/s00450-014-0271-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-014-0271-y