Abstract
Ganglia [1] is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. We propose an improved Ganglia-like clusters monitoring system, which has more reliability with federation node and associated link failures; some monitoring data is accessed by permission; adding control functions such as restart or shutdown confusion processes; send email or pager to cluster administrator when important event occurs; and optionally select some data to federation node based on user policy in order to speedup the WAN access. We have implemented a prototype system.
This research was supported by Guangdong Key Laboratory of Computer Network under grant 2002B60113.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring System: Design, Implementation, and Experience (February 2003) (submitted for publication)
The TeraGrid Project. Teragrid project web page (2001), http://www.teragrid.org
Foster, Kesselman, C.: Globus: A meta computing infrastructure toolkit. International Journal of Supercomputer Applications 11(2), 115–128 (1997)
Sottile, M., Minnich, R.: Supermon: A high speed cluster monitoring system. In: Proceedings of Cluster (September 2002)
Anderson, E., Patterson, D.: Extensible, scalable monitoring for clusters of computers. In: Proceedings of the 11th Systems Administration Conference (October 1997)
Amir, E., McCanne, S., Katz, R.H.: An active service framework and its application to realtime multimedia transcoding. In: Proceedings of the ACM SIGCOMM 1998 Conference on Communications Architectures and Protocols, pp. 178–189 (1998)
Chun, B.N., Culler, D.E.: Rexec: A decentralized, secure remote execution environment for clusters. In: Proceedings of the 4th Workshop on Communication, Architecture and Applications for Network based Parallel Computing (January 2000)
Hyarary, F.: Graph Theory. Addison-Wesley, Reading (1969)
Peterson, L., Culler, D., Anderson, T., Roscoe, T.: A blueprint for introducing disruptive technology into the internet. In: Proceedings of the 1st Workshop on Hot Topics in Networks, HotNets-I (October 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, W., Dong, S., Zhang, L., Liang, Z. (2004). An Improved Ganglia-Like Clusters Monitoring System. In: Li, M., Sun, XH., Deng, Q., Ni, J. (eds) Grid and Cooperative Computing. GCC 2003. Lecture Notes in Computer Science, vol 3033. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24680-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-24680-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21993-4
Online ISBN: 978-3-540-24680-0
eBook Packages: Springer Book Archive