Grid Computing pp 171-201 | Cite as

Monitoring and Controlling Grid Systems

  • Ciprian Dobre
Part of the Computer Communications and Networks book series (CCN)


An important part of managing global-scale distributed systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions, and for maintaining and optimizing workflow in large-scale distributed systems. In this chapter, we present the role, models, technologies, and structure of monitoring platforms designed for large-scale distributed systems. It also aims to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models, and related standardization activities.


Grid Resource Proxy Service Monitoring Service Monitoring Information Simple Network Management Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aiftimiei, C., Andreozzi, S., Cuscela, G., Donvito, G., Dudhalkar, V., Fantinel, S., Fattibene, E., Maggi, G., Misurelli, G., Pierro, A.: Recent evolutions of GridICE: a monitoring tool for grid systems. In: Proceedings of the 2007 Workshop on Grid Monitoring, Boston (2007)Google Scholar
  2. 2.
    ALICE Website: Retrieved from (2010)
  3. 3.
    Anderson, D.: SETI@Home Peer-To-Peer: Harnessing the Benefits of a Disruptive Technology. O’Reilly, Sebastopol (2001)Google Scholar
  4. 4.
    Aydt, R., Smith, W., Swany, M., Taylor, V., Tierney, B., Wolski, R.: A grid monitoring architecture. Retrieved from (2001)
  5. 5.
    Bagnasco, S., Cerello, P., Barbera, R., Buncic, P., Carminati, F., Saiz, P.: AliEn – EDG Interoperability in ALICE. In: Computing in High Energy and Nuclear Physics, La Jolla (2003)Google Scholar
  6. 6.
    Burke, S., Andreozzi, S., Field, L.: Experiences with the GLUE information schema in the LCG/EGEE production grid. In: International Conference on Computing in High Energy and Nuclear Physics. J. Phys. Conf. Ser. 119, 062019 (2008)Google Scholar
  7. 7.
    Clifford, B.: Globus monitoring and discovery. In: Proceedings of GlobusWORLD, Argonne (2005)Google Scholar
  8. 8.
    Condor: Project Official Homepage. Retrieved from (2008)
  9. 9.
    Cooke, A.J., Gray, G., Ma, L., Nutt, W., Magowan, J., Oevers, M., Taylor, P., Byrom, R., Field, L., Hicks, S., Leake, J., Soni, M., Wilson, A., Cordenonsi, R., Cornwall, R., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B., Kenny, S., O’Callaghan, D.: R-GMA: An information integration system for grid monitoring. In: Proceedings of the 11th International Conference on Cooperative Information Systems, Catania, pp. 462–481 (2003)Google Scholar
  10. 10.
    Costan, A., Dobre, C., Cristea, V., Voicu, R.: A monitoring architecture for high-speed networks in large scale distributed collaborations. In: Proceedings of the 7th International Symposium on Parallel and Distributed Computing, Krakow, pp. 409–416 (2008)Google Scholar
  11. 11.
    Cristea, V., Dobre, C., Pop, F. (eds.): Large-Scale Distributed Computing and Applications – Models and Trends. IGI Global, Hershey (2010)Google Scholar
  12. 12.
    Czajkowski, K., Kesselman, C., Fitzgerald, S., Foster, I.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, (HPDC’01), San Francisco, pp. 181 (2001)Google Scholar
  13. 13.
    Diaz, I., Fernandez, G., Martinm, M.J., Gonzalez, P., Tourino, J.: Integrating the common information model with MDS4. In: Proceedings of the 2008 9th IEEE/ACM international Conference on Grid Computing, Washington, DC, pp. 198–303 (2008)Google Scholar
  14. 14.
    Distributed Management Task Force, Inc. Common Information Model (CIM) standards: Retrieved from (2008)
  15. 15.
    Feng, H., Misra, V., Rubenstein, D.: PBS: A unified priority-based scheduler. In: Proceedings of the 2007 ACM SIGMETRICS international Conference on Measurement and Modeling of Computer Systems, San Diego, pp. 203–214 (2007)Google Scholar
  16. 16.
    Glue Schema Website: Retrieved from (2010)
  17. 17.
    Hawkeye Official Website: Retrieved from (2004)
  18. 18.
    JINI Website: Retrieved from (2010)
  19. 19.
    Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Toarta, M., Dobre, C.: MonALISA: An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications. CHEP04, Interlaken (2004)Google Scholar
  20. 20.
    Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Dobre, C., Muraru, A., Costan, A., Dediu, M., Stratan, C.: MonALISA: An agent based dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180(12), 2472–2498 (2009)MATHCrossRefGoogle Scholar
  21. 21.
    Lumb, I., Smith, C.: Scheduling attributes and platform LSF. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) Grid Resource Management: State of the Art and Future Trends, pp. 171–182. Kluwer, Boston (2004)Google Scholar
  22. 22.
    Mansouri-Samani, M., Sloman, M.: Monitoring distributed systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, pp. 303–347. Addison-Wesley Longman, Boston (1994)Google Scholar
  23. 23.
    Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)CrossRefGoogle Scholar
  24. 24.
    MDS4 Website (2010) The GT4 monitoring and discovery system. Retrieved from
  25. 25.
    MonALISA Official Website: Retrieved from (2010)
  26. 26.
    RRDTool Website: Retrieved from (2010)
  27. 27.
    SGE Website: Retrieved from (2009)
  28. 28.
    SRM Website: SRM: Storage Resource Manager. Retrieved from (2010)
  29. 29.
    Velt, S.: Neues vom schutzheiligen: Nagios in version 3.0 freigegeben. Technical report, IX (2008)Google Scholar
  30. 30.
    Wesner, S.: Integrated Management Framework for Dynamic Virtual Organizations. Höchstleistungsre-chenzentrum, Universität Stuttgart. Retrieved from (2008)
  31. 31.
    Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems. Future Gener. Comput. Syst. 21, 163–188 (2005)CrossRefGoogle Scholar
  32. 32.
    Zhang, X., Freschl, J.L., Schopf, J.M.: Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye. J. Parallel Distrib. Comput. 67(8), 883–902 (2007)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Computer Science Department, Faculty of Automatic Controls and ComputersUniversity Politehnica of BucharestBucharestRomania

Personalised recommendations