Advertisement

Grid Computing pp 171-201 | Cite as

Monitoring and Controlling Grid Systems

  • Ciprian Dobre
Chapter
Part of the Computer Communications and Networks book series (CCN)

Abstract

An important part of managing global-scale distributed systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions, and for maintaining and optimizing workflow in large-scale distributed systems. In this chapter, we present the role, models, technologies, and structure of monitoring platforms designed for large-scale distributed systems. It also aims to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models, and related standardization activities.

Keywords

Grid Resource Proxy Service Monitoring Service Monitoring Information Simple Network Management Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aiftimiei, C., Andreozzi, S., Cuscela, G., Donvito, G., Dudhalkar, V., Fantinel, S., Fattibene, E., Maggi, G., Misurelli, G., Pierro, A.: Recent evolutions of GridICE: a monitoring tool for grid systems. In: Proceedings of the 2007 Workshop on Grid Monitoring, Boston (2007)Google Scholar
  2. 2.
    ALICE Website: Retrieved from http://aliceinfo.cern.ch (2010)
  3. 3.
    Anderson, D.: SETI@Home Peer-To-Peer: Harnessing the Benefits of a Disruptive Technology. O’Reilly, Sebastopol (2001)Google Scholar
  4. 4.
    Aydt, R., Smith, W., Swany, M., Taylor, V., Tierney, B., Wolski, R.: A grid monitoring architecture. Retrieved from http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-3.pdf (2001)
  5. 5.
    Bagnasco, S., Cerello, P., Barbera, R., Buncic, P., Carminati, F., Saiz, P.: AliEn – EDG Interoperability in ALICE. In: Computing in High Energy and Nuclear Physics, La Jolla (2003)Google Scholar
  6. 6.
    Burke, S., Andreozzi, S., Field, L.: Experiences with the GLUE information schema in the LCG/EGEE production grid. In: International Conference on Computing in High Energy and Nuclear Physics. J. Phys. Conf. Ser. 119, 062019 (2008)Google Scholar
  7. 7.
    Clifford, B.: Globus monitoring and discovery. In: Proceedings of GlobusWORLD, Argonne (2005)Google Scholar
  8. 8.
    Condor: Project Official Homepage. Retrieved from http://www.cs.wisc.edu/condor (2008)
  9. 9.
    Cooke, A.J., Gray, G., Ma, L., Nutt, W., Magowan, J., Oevers, M., Taylor, P., Byrom, R., Field, L., Hicks, S., Leake, J., Soni, M., Wilson, A., Cordenonsi, R., Cornwall, R., Djaoui, A., Fisher, S., Podhorszki, N., Coghlan, B., Kenny, S., O’Callaghan, D.: R-GMA: An information integration system for grid monitoring. In: Proceedings of the 11th International Conference on Cooperative Information Systems, Catania, pp. 462–481 (2003)Google Scholar
  10. 10.
    Costan, A., Dobre, C., Cristea, V., Voicu, R.: A monitoring architecture for high-speed networks in large scale distributed collaborations. In: Proceedings of the 7th International Symposium on Parallel and Distributed Computing, Krakow, pp. 409–416 (2008)Google Scholar
  11. 11.
    Cristea, V., Dobre, C., Pop, F. (eds.): Large-Scale Distributed Computing and Applications – Models and Trends. IGI Global, Hershey (2010)Google Scholar
  12. 12.
    Czajkowski, K., Kesselman, C., Fitzgerald, S., Foster, I.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, (HPDC’01), San Francisco, pp. 181 (2001)Google Scholar
  13. 13.
    Diaz, I., Fernandez, G., Martinm, M.J., Gonzalez, P., Tourino, J.: Integrating the common information model with MDS4. In: Proceedings of the 2008 9th IEEE/ACM international Conference on Grid Computing, Washington, DC, pp. 198–303 (2008)Google Scholar
  14. 14.
    Distributed Management Task Force, Inc. Common Information Model (CIM) standards: Retrieved from http://www.dmtf.org/standards/cim (2008)
  15. 15.
    Feng, H., Misra, V., Rubenstein, D.: PBS: A unified priority-based scheduler. In: Proceedings of the 2007 ACM SIGMETRICS international Conference on Measurement and Modeling of Computer Systems, San Diego, pp. 203–214 (2007)Google Scholar
  16. 16.
    Glue Schema Website: Retrieved from http://www.globus.org/toolkit/mds/glueschemalink.html (2010)
  17. 17.
    Hawkeye Official Website: Retrieved from http://www.cs.wisc.edu/condor/hawkeye (2004)
  18. 18.
    JINI Website: Retrieved from http://www.jini.org (2010)
  19. 19.
    Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Toarta, M., Dobre, C.: MonALISA: An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications. CHEP04, Interlaken (2004)Google Scholar
  20. 20.
    Legrand, I.C., Newman, H.B., Voicu, R., Cirstoiu, C., Grigoras, C., Dobre, C., Muraru, A., Costan, A., Dediu, M., Stratan, C.: MonALISA: An agent based dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180(12), 2472–2498 (2009)MATHCrossRefGoogle Scholar
  21. 21.
    Lumb, I., Smith, C.: Scheduling attributes and platform LSF. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) Grid Resource Management: State of the Art and Future Trends, pp. 171–182. Kluwer, Boston (2004)Google Scholar
  22. 22.
    Mansouri-Samani, M., Sloman, M.: Monitoring distributed systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, pp. 303–347. Addison-Wesley Longman, Boston (1994)Google Scholar
  23. 23.
    Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)CrossRefGoogle Scholar
  24. 24.
    MDS4 Website (2010) The GT4 monitoring and discovery system. Retrieved from http://www.globus.org/toolkit/mds
  25. 25.
    MonALISA Official Website: Retrieved from http://monalisa.caltech.edu (2010)
  26. 26.
    RRDTool Website: Retrieved from http://oss.oetiker.ch/rrdtool (2010)
  27. 27.
    SGE Website: Retrieved from http://www.sun.com/software/sge (2009)
  28. 28.
    SRM Website: SRM: Storage Resource Manager. Retrieved from http://sdm.lbl.gov/srm-wg/documents.html (2010)
  29. 29.
    Velt, S.: Neues vom schutzheiligen: Nagios in version 3.0 freigegeben. Technical report, IX (2008)Google Scholar
  30. 30.
    Wesner, S.: Integrated Management Framework for Dynamic Virtual Organizations. Höchstleistungsre-chenzentrum, Universität Stuttgart. Retrieved from http://elib.uni-stuttgart.de/opus/volltexte/2009/3868/pdf/disswesner_final_as_printed.pdf (2008)
  31. 31.
    Zanikolas, S., Sakellariou, R.: A taxonomy of grid monitoring systems. Future Gener. Comput. Syst. 21, 163–188 (2005)CrossRefGoogle Scholar
  32. 32.
    Zhang, X., Freschl, J.L., Schopf, J.M.: Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye. J. Parallel Distrib. Comput. 67(8), 883–902 (2007)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Computer Science Department, Faculty of Automatic Controls and ComputersUniversity Politehnica of BucharestBucharestRomania

Personalised recommendations