Abstract
This paper describes a tool for on-line monitoring of distributed systems. The tool consists of a hardware component and software level, i.e., a hybrid monitor, which is capable of presenting the interactive user and the local operating system with a high-level information and performance evaluation of the activities in the host system with minimal interferences. A special hardware support, which consists of a test and measurement processor (TMP), was designed and has been implemented in the nodes of an experimental multicomputer system. The main function of the TMP is to execute software for monitoring the local system behavior and to measure the performance of both the resident operating system and the application software. The TMP can also be used to execute low level operating system functions, to manage local resources and to trigger time driven events in order to reduce the overhead of the host operating system. The operations of the TMP are completely transparent to the users with a minimal, less than 0.1%, overhead to the hardware system. In the experimental system, all the TMPs were connected with a central monitoring station, using an independent communication network, in order to provide a global view of the monitored system. The central monitoring station displays the resulting information in easy-to-read charts and graphs. Our experience with the TMP shows that it promotes an improved understanding of run-time behavior and performance measurements, to derive qualitative and quantitative assessments of distributed systems.
D. Haban is with the International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704.
D. Wybranietz is with the Department of Computer Science, University of Kaiserslautem; D-6750 Kaiserslautem, W. Germany.
A. Barak is with the International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704, on leave from The Hebrew University of Jerusalem, Israel.
Preview
Unable to display preview. Download preview PDF.
References
N. Allon, A. Barak, and U. Manber, "On disseminating information reliably without broadcasting," Proc. 7th Int. Conf. on Distributed Computing Systems, pp. 74–81, Berlin, Sept. 1987.
A. Barak and A. Litman, "MOS: A multicomputer distributed operating system," Software Practice & Experience, vol. 15, no. 8, pp. 725–737, Aug. 1985.
A. Barak and A. Shiloh, "A distributed load balancing policy for a multicomputer," Software Practice & Experience, vol. 15, no. 9, pp. 901–913, Sept. 1985.
C. Brown et al., "Research with the butterfly multicomputer," Computer Science and Computer Engineering Research Review 1884–1885, University of Rochester, 1985.
J. Cohen, "Garbage collection of linked data structures," ACM Computing Surveys, vol. 13, no. 3, pp. 341–367, Sep. 1981.
Z. Drezner and A. Barak, "An asynchronous algorithm for scattering information between the active nodes of a multicomputer system," J. of Parallel and Distributed Computing, vol. 3, no. 3, pp. 344–351, Sept. 1986.
D. Ferrari and V. Minetti, "A hybrid measurement tool for minicomputers," Experimental Computer Performance and Evaluation, D.Ferrari and M. Spadoni (eds), North-Holland Publishing Company, 1981.
K.A. Frenkel, "Evaluating two massively parallel machines," Commun. ACM, vol. 29, no. 8, pp. 752–758, Aug. 1986.
R. Gusella and S. Latti, "TEMPO-A network time controller for a distributed Berkeley UNIX system," Distributed Processing Tech. Comm. Newsletter, IEEE, vol. 6, no. 2, pp. 7–15, June 1984.
D. Haban and W. Weigel, "Global events and global breakpoints in distributed systems," Proc. 21st Hawaii Int. Conf. on System Sciences, vol. 2, pp. 166–175, Jan. 1988.
P. Krueger and M. Livny, "A comparison of preemptive and non-preemptive load distributing," Proc. 8th Int. Conf. on Distributed Computing Systems, San Jose, CA, pp. 123–130, June 1988.
L. Lamport, "Time, clocks and the ordering of events in a distributed system," Commun. ACM, vol. 21, no. 7, pp. 558–565, 1978.
J.E. Lambert and F. Halsall, "Program debugging and performance evaluation aids for a multimicroprocessor system," Software & Microsystems, vol. 3, no. 1, pp. 2–10, Feb. 1984.
K.J. Lee and D. Towsley, "A comparison of priority-based decentralized load balancing policies," Proc ACM SIGMETRICS Conf., pp. 70–77, 1986.
B. Liskov, "Primitives for distributed computing," Proc. 7th Symp. Operating System Principles, pp. 33–42, 1979.
J. Nehmer, D. Haban, F. Mattern, D. Wybranietz and D. Rombach, "Key concepts of the INCAS multicomputer project," IEEE Trans. on Software Engineering, vol. 13, no. 8, pp. 913–923, Aug. 1987.
B. Plattner and J. Nievergelt, "Monitoring program execution: A survey," IEEE Computer, pp. 76–93, Nov. 1981.
C.L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22–33, 1985.
L. Svobodova, "Online system performance measurements with software and hybrid monitors," Operating Systems Rev., vol. 7, no. 4, pp. 45–53, Oct. 1973.
A.S. Tanenbaum, "Operating systems: Design and implementation," Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1987.
D. Wybranietz and D. Haban, "Monitoring and performance measuring distributed systems," Proc. ACM SIGMETRICS, Santa Fe, in: ACM Performance Evaluation Review, vol. 16, no. 1, pp. 197–206, May 1988.
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haban, D., Wybranietz, D., Barak, A. (1990). Monitoring and management-support of distributed systems. In: Schröder-Preikschat, W., Zimmer, W. (eds) Progress in Distributed Operating Systems and Distributed Systems Management. Lecture Notes in Computer Science, vol 433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52609-9_80
Download citation
DOI: https://doi.org/10.1007/3-540-52609-9_80
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-52609-4
Online ISBN: 978-3-540-47074-8
eBook Packages: Springer Book Archive