Skip to main content

A New Scalable Monitoring Tool Using Performance Properties of HPC Systems

  • Conference paper
  • First Online:
Competence in High Performance Computing 2010

Abstract

We present a monitoring and analysis tool prototype for system wide monitoring of High Performance Computers. The tool uses formal specification of properties which are based on hardware counters. These evaluate the performance at different granularities, namely at core, application and partition graininess. The information obtained is aimed at detecting single node performance as well as parallel execution performance. The goal is to identify performance bottlenecks in running applications as well as the general system behaviour. The scalability in our prototype for highly parallel machines is achieved through a distributed software architecture. We use an analysis agent at each partition. These agents communicate to a high level agent using a communication protocol based on TCP/IP. The high level agent has as a main task the synchronisation of the rest of the agents. Moreover, the analysis agents have the capability to use OpenMP within each partition to parallelise their monitoring tasks. Our approach used to tackle the storing of large amounts of information is achieved by data reduction. Only the properties that detect a bottleneck are stored, thus we don’t compromise the quality of the needed monitoring information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gerndt, M., Fuerlinger, K.: Automatic performance analysis with periscope. Journal: Concurrency and Computation: Practice and Experience.Wiley InterScience. John Wiley & Sons, Ltd. (2009)

    Google Scholar 

  2. Gerndt, M., Fuerlinger, K., Kereku, E.: Periscope: Advanced techniques for performance analysis, parallel computing: Current & future issues of high-end computing. In: International Conference ParCo 2005, vol. 33 (2006). NIC Series ISBN 3-00-017352-8

    Google Scholar 

  3. Gerndt, M., Kereku, E.: Search strategies for automatic performance analysis tools. In: Euro-Par 2007, vol. LNCS 4641, pp. 129–138 (2007)

    Google Scholar 

  4. Gerndt, M., Strohhaecker, S.: Distribution of analysis agents in periscope on altix 4700. In: Proceedings of ParCo (2007)

    Google Scholar 

  5. HP: pfmon tool. www.hpl.hp.com/research/linux/perfmon/pfmon.php4

  6. Intel: Introduction to Microarchitectural Optimization for Itanium 2 Processors (2002). URL http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf

  7. Nataraj, A., Sottile, M., Morris, A., Malony, A., Shende, S.: Tauoversupermon: Low-overhead online parallel performance monitoring. In: Proceedings Euro-Par 2007, vol. LNCS 4641, pp. 85–96 (2007)

    Google Scholar 

  8. Schmidt, D.C.: The adaptive communication environment: Object-oriented network programming components for developing client/server applications. In: Proceedings of the 12th Annual Sun Users Group Conference, pp. 214–225 (1994)

    Google Scholar 

  9. Schmidt, D.C., Huston, D., Buschmann, F.: C++ Network Programming Vol. 1: Mastering Complexity with ACE and Patterns. Pearson Education (2002)

    Google Scholar 

  10. Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER ’02, pp. 39–. IEEE Computer Society, Washington, DC, USA (2002). URL http://portal.acm.org/citation.cfm?id=792762.793324

Download references

Acknowledgements

This work is funded by BMBF under the ISAR project, grant 01IH08005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfram Hesse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guillen, C., Hesse, W., Brehm, M. (2011). A New Scalable Monitoring Tool Using Performance Properties of HPC Systems. In: Bischof, C., Hegering, HG., Nagel, W., Wittum, G. (eds) Competence in High Performance Computing 2010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24025-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24025-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24024-9

  • Online ISBN: 978-3-642-24025-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics