Skip to main content

Monitoring and Controlling Grid Systems

  • Chapter
  • First Online:
Grid Computing

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

An important part of managing global-scale distributed systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions, and for maintaining and optimizing workflow in large-scale distributed systems. In this chapter, we present the role, models, technologies, and structure of monitoring platforms designed for large-scale distributed systems. It also aims to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models, and related standardization activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aiftimiei, C., Andreozzi, S., Cuscela, G., Donvito, G., Dudhalkar, V., Fantinel, S., Fattibene, E., Maggi, G., Misurelli, G., Pierro, A.: Recent evolutions of GridICE: a monitoring tool for grid systems. In: Proceedings of the 2007 Workshop on Grid Monitoring, Boston (2007)

    Google Scholar 

  2. ALICE Website: Retrieved from http://aliceinfo.cern.ch (2010)

  3. Anderson, D.: SETI@Home Peer-To-Peer: Harnessing the Benefits of a Disruptive Technology. O’Reilly, Sebastopol (2001)

    Google Scholar 

  4. Aydt, R., Smith, W., Swany, M., Taylor, V., Tierney, B., Wolski, R.: A grid monitoring architecture. Retrieved from http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-3.pdf (2001)

  5. Bagnasco, S., Cerello, P., Barbera, R., Buncic, P., Carminati, F., Saiz, P.: AliEn – EDG Interoperability in ALICE. In: Computing in High Energy and Nuclear Physics, La Jolla (2003)

    Google Scholar 

  6. Burke, S., Andreozzi, S., Field, L.: Experiences with the GLUE information schema in the LCG/EGEE production grid. In: International Conference on Computing in High Energy and Nuclear Physics. J. Phys. Conf. Ser. 119, 062019 (2008)

    Google Scholar 

  7. Clifford, B.: Globus monitoring and discovery. In: Proceedings of GlobusWORLD, Argonne (2005)

    Google Scholar 

  8. Condor: Project Official Homepage. Retrieved from http://www.cs.wisc.edu/condor (2008)

  9. Cooke, A.J., Gray, G., Ma, L., Nutt, W., Magowan, J., Oevers, M., Taylor, P., Byrom, R., Field, L., Hicks, S., Leake, J., Soni, M., Wilson, A., Cordenonsi, R., Cornwall, R., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B., Kenny, S., O’Callaghan, D.: R-GMA: An information integration system for grid monitoring. In: Proceedings of the 11th International Conference on Cooperative Information Systems, Catania, pp. 462–481 (2003)

    Google Scholar 

  10. Costan, A., Dobre, C., Cristea, V., Voicu, R.: A monitoring architecture for high-speed networks in large scale distributed collaborations. In: Proceedings of the 7th International Symposium on Parallel and Distributed Computing, Krakow, pp. 409–416 (2008)

    Google Scholar 

  11. Cristea, V., Dobre, C., Pop, F. (eds.): Large-Scale Distributed Computing and Applications – Models and Trends. IGI Global, Hershey (2010)

    Google Scholar 

  12. Czajkowski, K., Kesselman, C., Fitzgerald, S., Foster, I.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, (HPDC’01), San Francisco, pp. 181 (2001)

    Google Scholar 

  13. Diaz, I., Fernandez, G., Martinm, M.J., Gonzalez, P., Tourino, J.: Integrating the common information model with MDS4. In: Proceedings of the 2008 9th IEEE/ACM international Conference on Grid Computing, Washington, DC, pp. 198–303 (2008)

    Google Scholar 

  14. Distributed Management Task Force, Inc. Common Information Model (CIM) standards: Retrieved from http://www.dmtf.org/standards/cim (2008)

  15. Feng, H., Misra, V., Rubenstein, D.: PBS: A unified priority-based scheduler. In: Proceedings of the 2007 ACM SIGMETRICS international Conference on Measurement and Modeling of Computer Systems, San Diego, pp. 203–214 (2007)

    Google Scholar 

  16. Glue Schema Website: Retrieved from http://www.globus.org/toolkit/mds/glueschemalink.html (2010)

  17. Hawkeye Official Website: Retrieved from http://www.cs.wisc.edu/condor/hawkeye (2004)

  18. JINI Website: Retrieved from http://www.jini.org (2010)

  19. Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Toarta, M., Dobre, C.: MonALISA: An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications. CHEP04, Interlaken (2004)

    Google Scholar 

  20. Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Dobre, C., Muraru, A., Costan, A., Dediu, M., Stratan, C.: MonALISA: An agent based dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180(12), 2472–2498 (2009)

    Article  MATH  Google Scholar 

  21. Lumb, I., Smith, C.: Scheduling attributes and platform LSF. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) Grid Resource Management: State of the Art and Future Trends, pp. 171–182. Kluwer, Boston (2004)

    Google Scholar 

  22. Mansouri-Samani, M., Sloman, M.: Monitoring distributed systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, pp. 303–347. Addison-Wesley Longman, Boston (1994)

    Google Scholar 

  23. Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)

    Article  Google Scholar 

  24. MDS4 Website (2010) The GT4 monitoring and discovery system. Retrieved from http://www.globus.org/toolkit/mds

  25. MonALISA Official Website: Retrieved from http://monalisa.caltech.edu (2010)

  26. RRDTool Website: Retrieved from http://oss.oetiker.ch/rrdtool (2010)

  27. SGE Website: Retrieved from http://www.sun.com/software/sge (2009)

  28. SRM Website: SRM: Storage Resource Manager. Retrieved from http://sdm.lbl.gov/srm-wg/documents.html (2010)

  29. Velt, S.: Neues vom schutzheiligen: Nagios in version 3.0 freigegeben. Technical report, IX (2008)

    Google Scholar 

  30. Wesner, S.: Integrated Management Framework for Dynamic Virtual Organizations. Höchstleistungsre-chenzentrum, Universität Stuttgart. Retrieved from http://elib.uni-stuttgart.de/opus/volltexte/2009/3868/pdf/disswesner_final_as_printed.pdf (2008)

  31. Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems. Future Gener. Comput. Syst. 21, 163–188 (2005)

    Article  Google Scholar 

  32. Zhang, X., Freschl, J.L., Schopf, J.M.: Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye. J. Parallel Distrib. Comput. 67(8), 883–902 (2007)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ciprian Dobre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Dobre, C. (2011). Monitoring and Controlling Grid Systems. In: Preve, N. (eds) Grid Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-676-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-676-4_7

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-675-7

  • Online ISBN: 978-0-85729-676-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics