Recursively Structured Fault-Tolerant Distributed Computing Systems
a distributed computing system should be functionally equivalent to the individual computing systems of which it is composed, and
fault tolerant systems should be constructed from generalised fault tolerant components.
The reasoning behind these two “recursive structuring principles”, and the consequences of attempting to adhere to them, are discussed. Where appropriate this discussion is illustrated by reference to a distributed system based on UNIX† that is now operational at Newcastle and numerous other locations. This system has been implemented by adding a software subsystem, known as the Newcastle Connection, to each of a set of UNIX systems. By this means we have constructed a distributed system which is functionally equivalent at both the user and the program level to a conventional uni-processor UNIX system.
(Based on the paper “Recursively Structured Distributed Computing Systems” by B. Randell appearing in IEEE 1983 PROCEEDINGS OF THE THIRD SYMPOSIUM ON RELIABILITY IN DISTRIBUTED SOFTWARE AND DATABASE SYSTEMS, October 17–19, 1983, Clearwater Beach, FL, pp. 3–11. Copyright C 1983 IEEE.)
KeywordsFault Tolerance System Call Atomic Action Distribute Computing System Exception Handling
Unable to display preview. Download preview PDF.
- 2.B. Randell, “The Newcastle Connection: A Software Subsystem for Constructing Distributed UNIX Systems,” Technical Report TR194, Computing Laboratory, University of Newcastle upon Tyne, September, 1984.Google Scholar
- 3.W. Wilner, “Recursive Machines,” Internal Report, Xerox Corporation, 1980. Also: In ‘VLSI: Machine Architecture and Very High Level Language’, Ed. P.C. Treleaven, ACM Computer Architecture News 8(7) December 1980 pp. 27–38 (Technical Report 156 University of Newcastle upon Tyne).Google Scholar
- 4.F. Panzieri and S. K. Shrivastava, “Reliable Remote Calls for Distributed UNIX: An implementation study,” in Proc. Second Symp. on Reliability in Distributed Software and Database Systems, pp. 127–133, IEEE, Pittsburg, July 1982.Google Scholar
- 5.R. M. Needham and A. J. Herbert, The Cambridge Distributed Computing System, Addison-Wesley, 1982.Google Scholar
- 6.T. Anderson and P.A. Lee, Fault Tolerance: Principles and Practice, Prentice-Hall, Englewood Cliffs, N.J., 1981.Google Scholar
- 8.B. J. Nelson, Remote Procedure Call, Ph.D. Thesis, Computer Science Dept., Carnegie-Mellon Univ., Pittsburg, Pa., 1981.Google Scholar
- 11.M. Jegado, “Recoverability Aspects of a Distributed File System,” Software Practice and Experience, vol. 13, no. 1, pp. 33–44, Jan. 1983.Google Scholar
- 12.J.N. Gray, “Notes on Data Base Operating Systems,” in Lecture Notes in Computer Science 60, ed. R. Bayer, R. M. Graham and G. Seegmueller, pp. 393–481, Springer-Verlag, New York, N.Y., 1978.Google Scholar
- 13.G.W.R. Luderer, H. Che, J.P. Haggerty, P.A. Kirslis, and W.T. Marshall, “A Distributed Unix System Based on a Virtual Circuit Switch,” Proc. 8th Symp. Operating System Principles, pp. 160–168, ACM, Pacific Grove, California., December 1981. Also in: ACM Special Interest Group on Operating Systems — Operating Systems Review, Vol. 15(5) (December 1981).Google Scholar
- 14.J. M. Rushby and B. Randell, “A Distributed Secure System,” Computer, vol. 16, no. 7, IEEE, July 1983.Google Scholar