A New Scalable Monitoring Tool Using Performance Properties of HPC Systems

Guillen, Carla; Hesse, Wolfram; Brehm, Matthias

doi:10.1007/978-3-642-24025-6_5

Carla Guillen⁵,
Wolfram Hesse⁵ &
Matthias Brehm⁵

661 Accesses
1 Citations

Abstract

We present a monitoring and analysis tool prototype for system wide monitoring of High Performance Computers. The tool uses formal specification of properties which are based on hardware counters. These evaluate the performance at different granularities, namely at core, application and partition graininess. The information obtained is aimed at detecting single node performance as well as parallel execution performance. The goal is to identify performance bottlenecks in running applications as well as the general system behaviour. The scalability in our prototype for highly parallel machines is achieved through a distributed software architecture. We use an analysis agent at each partition. These agents communicate to a high level agent using a communication protocol based on TCP/IP. The high level agent has as a main task the synchronisation of the rest of the agents. Moreover, the analysis agents have the capability to use OpenMP within each partition to parallelise their monitoring tasks. Our approach used to tackle the storing of large amounts of information is achieved by data reduction. Only the properties that detect a bottleneck are stored, thus we don’t compromise the quality of the needed monitoring information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gerndt, M., Fuerlinger, K.: Automatic performance analysis with periscope. Journal: Concurrency and Computation: Practice and Experience.Wiley InterScience. John Wiley & Sons, Ltd. (2009)
Google Scholar
Gerndt, M., Fuerlinger, K., Kereku, E.: Periscope: Advanced techniques for performance analysis, parallel computing: Current & future issues of high-end computing. In: International Conference ParCo 2005, vol. 33 (2006). NIC Series ISBN 3-00-017352-8
Google Scholar
Gerndt, M., Kereku, E.: Search strategies for automatic performance analysis tools. In: Euro-Par 2007, vol. LNCS 4641, pp. 129–138 (2007)
Google Scholar
Gerndt, M., Strohhaecker, S.: Distribution of analysis agents in periscope on altix 4700. In: Proceedings of ParCo (2007)
Google Scholar
HP: pfmon tool. www.hpl.hp.com/research/linux/perfmon/pfmon.php4
Intel: Introduction to Microarchitectural Optimization for Itanium 2 Processors (2002). URL http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf
Nataraj, A., Sottile, M., Morris, A., Malony, A., Shende, S.: Tauoversupermon: Low-overhead online parallel performance monitoring. In: Proceedings Euro-Par 2007, vol. LNCS 4641, pp. 85–96 (2007)
Google Scholar
Schmidt, D.C.: The adaptive communication environment: Object-oriented network programming components for developing client/server applications. In: Proceedings of the 12th Annual Sun Users Group Conference, pp. 214–225 (1994)
Google Scholar
Schmidt, D.C., Huston, D., Buschmann, F.: C++ Network Programming Vol. 1: Mastering Complexity with ACE and Patterns. Pearson Education (2002)
Google Scholar
Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER ’02, pp. 39–. IEEE Computer Society, Washington, DC, USA (2002). URL http://portal.acm.org/citation.cfm?id=792762.793324

Download references

Acknowledgements

This work is funded by BMBF under the ISAR project, grant 01IH08005.

Author information

Authors and Affiliations

Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching, Germany
Carla Guillen, Wolfram Hesse & Matthias Brehm

Authors

Carla Guillen
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Hesse
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Brehm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfram Hesse .

Editor information

Editors and Affiliations

Technical University Darmstadt, Mornewegstr. 30, Darmstadt, 64293, Germany
Christian Bischof
Leibniz Rechenzentrum (LRZ), Boltzmannstr. 1, Garching, 85748, Germany
Heinz-Gerd Hegering
Center for Information Services, and High Performance Computing (ZIH), Technical University Dresden, Helmholtzstr. 10, Dresden, 01062, Germany
Wolfgang E. Nagel
, Goethe Center for Scientific Computing (, Goethe University Frankfurt, Kettenhofweg 139, Frankfurt, 60325, Hessen, Germany
Gabriel Wittum

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guillen, C., Hesse, W., Brehm, M. (2011). A New Scalable Monitoring Tool Using Performance Properties of HPC Systems. In: Bischof, C., Hegering, HG., Nagel, W., Wittum, G. (eds) Competence in High Performance Computing 2010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24025-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-24025-6_5
Published: 08 November 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24024-9
Online ISBN: 978-3-642-24025-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics