Skip to main content

Monitoring and management-support of distributed systems

  • Technical Paper
  • Conference paper
  • First Online:
Progress in Distributed Operating Systems and Distributed Systems Management

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 433))

Abstract

This paper describes a tool for on-line monitoring of distributed systems. The tool consists of a hardware component and software level, i.e., a hybrid monitor, which is capable of presenting the interactive user and the local operating system with a high-level information and performance evaluation of the activities in the host system with minimal interferences. A special hardware support, which consists of a test and measurement processor (TMP), was designed and has been implemented in the nodes of an experimental multicomputer system. The main function of the TMP is to execute software for monitoring the local system behavior and to measure the performance of both the resident operating system and the application software. The TMP can also be used to execute low level operating system functions, to manage local resources and to trigger time driven events in order to reduce the overhead of the host operating system. The operations of the TMP are completely transparent to the users with a minimal, less than 0.1%, overhead to the hardware system. In the experimental system, all the TMPs were connected with a central monitoring station, using an independent communication network, in order to provide a global view of the monitored system. The central monitoring station displays the resulting information in easy-to-read charts and graphs. Our experience with the TMP shows that it promotes an improved understanding of run-time behavior and performance measurements, to derive qualitative and quantitative assessments of distributed systems.

D. Haban is with the International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704.

D. Wybranietz is with the Department of Computer Science, University of Kaiserslautem; D-6750 Kaiserslautem, W. Germany.

A. Barak is with the International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704, on leave from The Hebrew University of Jerusalem, Israel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Allon, A. Barak, and U. Manber, "On disseminating information reliably without broadcasting," Proc. 7th Int. Conf. on Distributed Computing Systems, pp. 74–81, Berlin, Sept. 1987.

    Google Scholar 

  2. A. Barak and A. Litman, "MOS: A multicomputer distributed operating system," Software Practice & Experience, vol. 15, no. 8, pp. 725–737, Aug. 1985.

    Google Scholar 

  3. A. Barak and A. Shiloh, "A distributed load balancing policy for a multicomputer," Software Practice & Experience, vol. 15, no. 9, pp. 901–913, Sept. 1985.

    Google Scholar 

  4. C. Brown et al., "Research with the butterfly multicomputer," Computer Science and Computer Engineering Research Review 1884–1885, University of Rochester, 1985.

    Google Scholar 

  5. J. Cohen, "Garbage collection of linked data structures," ACM Computing Surveys, vol. 13, no. 3, pp. 341–367, Sep. 1981.

    Google Scholar 

  6. Z. Drezner and A. Barak, "An asynchronous algorithm for scattering information between the active nodes of a multicomputer system," J. of Parallel and Distributed Computing, vol. 3, no. 3, pp. 344–351, Sept. 1986.

    Google Scholar 

  7. D. Ferrari and V. Minetti, "A hybrid measurement tool for minicomputers," Experimental Computer Performance and Evaluation, D.Ferrari and M. Spadoni (eds), North-Holland Publishing Company, 1981.

    Google Scholar 

  8. K.A. Frenkel, "Evaluating two massively parallel machines," Commun. ACM, vol. 29, no. 8, pp. 752–758, Aug. 1986.

    Google Scholar 

  9. R. Gusella and S. Latti, "TEMPO-A network time controller for a distributed Berkeley UNIX system," Distributed Processing Tech. Comm. Newsletter, IEEE, vol. 6, no. 2, pp. 7–15, June 1984.

    Google Scholar 

  10. D. Haban and W. Weigel, "Global events and global breakpoints in distributed systems," Proc. 21st Hawaii Int. Conf. on System Sciences, vol. 2, pp. 166–175, Jan. 1988.

    Google Scholar 

  11. P. Krueger and M. Livny, "A comparison of preemptive and non-preemptive load distributing," Proc. 8th Int. Conf. on Distributed Computing Systems, San Jose, CA, pp. 123–130, June 1988.

    Google Scholar 

  12. L. Lamport, "Time, clocks and the ordering of events in a distributed system," Commun. ACM, vol. 21, no. 7, pp. 558–565, 1978.

    Google Scholar 

  13. J.E. Lambert and F. Halsall, "Program debugging and performance evaluation aids for a multimicroprocessor system," Software & Microsystems, vol. 3, no. 1, pp. 2–10, Feb. 1984.

    Google Scholar 

  14. K.J. Lee and D. Towsley, "A comparison of priority-based decentralized load balancing policies," Proc ACM SIGMETRICS Conf., pp. 70–77, 1986.

    Google Scholar 

  15. B. Liskov, "Primitives for distributed computing," Proc. 7th Symp. Operating System Principles, pp. 33–42, 1979.

    Google Scholar 

  16. J. Nehmer, D. Haban, F. Mattern, D. Wybranietz and D. Rombach, "Key concepts of the INCAS multicomputer project," IEEE Trans. on Software Engineering, vol. 13, no. 8, pp. 913–923, Aug. 1987.

    Google Scholar 

  17. B. Plattner and J. Nievergelt, "Monitoring program execution: A survey," IEEE Computer, pp. 76–93, Nov. 1981.

    Google Scholar 

  18. C.L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22–33, 1985.

    Google Scholar 

  19. L. Svobodova, "Online system performance measurements with software and hybrid monitors," Operating Systems Rev., vol. 7, no. 4, pp. 45–53, Oct. 1973.

    Google Scholar 

  20. A.S. Tanenbaum, "Operating systems: Design and implementation," Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1987.

    Google Scholar 

  21. D. Wybranietz and D. Haban, "Monitoring and performance measuring distributed systems," Proc. ACM SIGMETRICS, Santa Fe, in: ACM Performance Evaluation Review, vol. 16, no. 1, pp. 197–206, May 1988.

    Google Scholar 

Download references

Authors

Editor information

Wolfgang Schröder-Preikschat Wolfgang Zimmer

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Haban, D., Wybranietz, D., Barak, A. (1990). Monitoring and management-support of distributed systems. In: Schröder-Preikschat, W., Zimmer, W. (eds) Progress in Distributed Operating Systems and Distributed Systems Management. Lecture Notes in Computer Science, vol 433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52609-9_80

Download citation

  • DOI: https://doi.org/10.1007/3-540-52609-9_80

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-52609-4

  • Online ISBN: 978-3-540-47074-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics