Abstract
An important part of managing global-scale distributed systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions, and for maintaining and optimizing workflow in large-scale distributed systems. In this chapter, we present the role, models, technologies, and structure of monitoring platforms designed for large-scale distributed systems. It also aims to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models, and related standardization activities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aiftimiei, C., Andreozzi, S., Cuscela, G., Donvito, G., Dudhalkar, V., Fantinel, S., Fattibene, E., Maggi, G., Misurelli, G., Pierro, A.: Recent evolutions of GridICE: a monitoring tool for grid systems. In: Proceedings of the 2007 Workshop on Grid Monitoring, Boston (2007)
ALICE Website: Retrieved from http://aliceinfo.cern.ch (2010)
Anderson, D.: SETI@Home Peer-To-Peer: Harnessing the Benefits of a Disruptive Technology. O’Reilly, Sebastopol (2001)
Aydt, R., Smith, W., Swany, M., Taylor, V., Tierney, B., Wolski, R.: A grid monitoring architecture. Retrieved from http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-3.pdf (2001)
Bagnasco, S., Cerello, P., Barbera, R., Buncic, P., Carminati, F., Saiz, P.: AliEn – EDG Interoperability in ALICE. In: Computing in High Energy and Nuclear Physics, La Jolla (2003)
Burke, S., Andreozzi, S., Field, L.: Experiences with the GLUE information schema in the LCG/EGEE production grid. In: International Conference on Computing in High Energy and Nuclear Physics. J. Phys. Conf. Ser. 119, 062019 (2008)
Clifford, B.: Globus monitoring and discovery. In: Proceedings of GlobusWORLD, Argonne (2005)
Condor: Project Official Homepage. Retrieved from http://www.cs.wisc.edu/condor (2008)
Cooke, A.J., Gray, G., Ma, L., Nutt, W., Magowan, J., Oevers, M., Taylor, P., Byrom, R., Field, L., Hicks, S., Leake, J., Soni, M., Wilson, A., Cordenonsi, R., Cornwall, R., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B., Kenny, S., O’Callaghan, D.: R-GMA: An information integration system for grid monitoring. In: Proceedings of the 11th International Conference on Cooperative Information Systems, Catania, pp. 462–481 (2003)
Costan, A., Dobre, C., Cristea, V., Voicu, R.: A monitoring architecture for high-speed networks in large scale distributed collaborations. In: Proceedings of the 7th International Symposium on Parallel and Distributed Computing, Krakow, pp. 409–416 (2008)
Cristea, V., Dobre, C., Pop, F. (eds.): Large-Scale Distributed Computing and Applications – Models and Trends. IGI Global, Hershey (2010)
Czajkowski, K., Kesselman, C., Fitzgerald, S., Foster, I.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, (HPDC’01), San Francisco, pp. 181 (2001)
Diaz, I., Fernandez, G., Martinm, M.J., Gonzalez, P., Tourino, J.: Integrating the common information model with MDS4. In: Proceedings of the 2008 9th IEEE/ACM international Conference on Grid Computing, Washington, DC, pp. 198–303 (2008)
Distributed Management Task Force, Inc. Common Information Model (CIM) standards: Retrieved from http://www.dmtf.org/standards/cim (2008)
Feng, H., Misra, V., Rubenstein, D.: PBS: A unified priority-based scheduler. In: Proceedings of the 2007 ACM SIGMETRICS international Conference on Measurement and Modeling of Computer Systems, San Diego, pp. 203–214 (2007)
Glue Schema Website: Retrieved from http://www.globus.org/toolkit/mds/glueschemalink.html (2010)
Hawkeye Official Website: Retrieved from http://www.cs.wisc.edu/condor/hawkeye (2004)
JINI Website: Retrieved from http://www.jini.org (2010)
Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Toarta, M., Dobre, C.: MonALISA: An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications. CHEP04, Interlaken (2004)
Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Dobre, C., Muraru, A., Costan, A., Dediu, M., Stratan, C.: MonALISA: An agent based dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180(12), 2472–2498 (2009)
Lumb, I., Smith, C.: Scheduling attributes and platform LSF. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) Grid Resource Management: State of the Art and Future Trends, pp. 171–182. Kluwer, Boston (2004)
Mansouri-Samani, M., Sloman, M.: Monitoring distributed systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, pp. 303–347. Addison-Wesley Longman, Boston (1994)
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)
MDS4 Website (2010) The GT4 monitoring and discovery system. Retrieved from http://www.globus.org/toolkit/mds
MonALISA Official Website: Retrieved from http://monalisa.caltech.edu (2010)
RRDTool Website: Retrieved from http://oss.oetiker.ch/rrdtool (2010)
SGE Website: Retrieved from http://www.sun.com/software/sge (2009)
SRM Website: SRM: Storage Resource Manager. Retrieved from http://sdm.lbl.gov/srm-wg/documents.html (2010)
Velt, S.: Neues vom schutzheiligen: Nagios in version 3.0 freigegeben. Technical report, IX (2008)
Wesner, S.: Integrated Management Framework for Dynamic Virtual Organizations. Höchstleistungsre-chenzentrum, Universität Stuttgart. Retrieved from http://elib.uni-stuttgart.de/opus/volltexte/2009/3868/pdf/disswesner_final_as_printed.pdf (2008)
Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems. Future Gener. Comput. Syst. 21, 163–188 (2005)
Zhang, X., Freschl, J.L., Schopf, J.M.: Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye. J. Parallel Distrib. Comput. 67(8), 883–902 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Dobre, C. (2011). Monitoring and Controlling Grid Systems. In: Preve, N. (eds) Grid Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-676-4_7
Download citation
DOI: https://doi.org/10.1007/978-0-85729-676-4_7
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-675-7
Online ISBN: 978-0-85729-676-4
eBook Packages: Computer ScienceComputer Science (R0)