Recursively Structured Fault-Tolerant Distributed Computing Systems

  • B. Randell
Conference paper
Part of the NATO ASI Series book series (volume 22)

Abstract

Two design rules which aid the construction of distributed computing systems and the provision of fault tolerance are described, namely that:
  1. (i)

    a distributed computing system should be functionally equivalent to the individual computing systems of which it is composed, and

     
  2. (ii)

    fault tolerant systems should be constructed from generalised fault tolerant components.

     

The reasoning behind these two “recursive structuring principles”, and the consequences of attempting to adhere to them, are discussed. Where appropriate this discussion is illustrated by reference to a distributed system based on UNIX† that is now operational at Newcastle and numerous other locations. This system has been implemented by adding a software subsystem, known as the Newcastle Connection, to each of a set of UNIX systems. By this means we have constructed a distributed system which is functionally equivalent at both the user and the program level to a conventional uni-processor UNIX system.

(Based on the paper “Recursively Structured Distributed Computing Systems” by B. Randell appearing in IEEE 1983 PROCEEDINGS OF THE THIRD SYMPOSIUM ON RELIABILITY IN DISTRIBUTED SOFTWARE AND DATABASE SYSTEMS, October 17–19, 1983, Clearwater Beach, FL, pp. 3–11. Copyright C 1983 IEEE.)

Keywords

Radar Beach Subsys 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D.R. Brownbridge, L.F. Marshall, and B. Randell, “The Newcastle Connection — or UNIXes of the World Unite,” Software Practice and Experience, vol. 12, no. 12, pp. 1147–1162, December 1982.CrossRefGoogle Scholar
  2. 2.
    B. Randell, “The Newcastle Connection: A Software Subsystem for Constructing Distributed UNIX Systems,” Technical Report TR194, Computing Laboratory, University of Newcastle upon Tyne, September, 1984.Google Scholar
  3. 3.
    W. Wilner, “Recursive Machines,” Internal Report, Xerox Corporation, 1980. Also: In ‘VLSI: Machine Architecture and Very High Level Language’, Ed. P.C. Treleaven, ACM Computer Architecture News 8(7) December 1980 pp. 27–38 (Technical Report 156 University of Newcastle upon Tyne).Google Scholar
  4. 4.
    F. Panzieri and S. K. Shrivastava, “Reliable Remote Calls for Distributed UNIX: An implementation study,” in Proc. Second Symp. on Reliability in Distributed Software and Database Systems, pp. 127–133, IEEE, Pittsburg, July 1982.Google Scholar
  5. 5.
    R. M. Needham and A. J. Herbert, The Cambridge Distributed Computing System, Addison-Wesley, 1982.Google Scholar
  6. 6.
    T. Anderson and P.A. Lee, Fault Tolerance: Principles and Practice, Prentice-Hall, Englewood Cliffs, N.J., 1981.Google Scholar
  7. 7.
    B.H. Liskov and A. Snyder, “Exception Handling in CLU,” IEEE Transactions on Software Engineering, vol. SE-5, no. 6, pp. 546–558, November 1979.CrossRefGoogle Scholar
  8. 8.
    B. J. Nelson, Remote Procedure Call, Ph.D. Thesis, Computer Science Dept., Carnegie-Mellon Univ., Pittsburg, Pa., 1981.Google Scholar
  9. 9.
    S. K. Shrivastava, “Structuring Distributed Systems for Reliability and Crash Resistance,” IEEE Trans. Software Eng., vol. SE-7, no. 4, pp. 436–447, July 1981.CrossRefGoogle Scholar
  10. 10.
    S. K. Shrivastava and F. Panzieri, “The Design of a Reliable Remote Procedure Call Mechanism,” IEEE Trans. on Computers, vol. C-31, no. 7, pp. 692–697, July 1982.CrossRefGoogle Scholar
  11. 11.
    M. Jegado, “Recoverability Aspects of a Distributed File System,” Software Practice and Experience, vol. 13, no. 1, pp. 33–44, Jan. 1983.Google Scholar
  12. 12.
    J.N. Gray, “Notes on Data Base Operating Systems,” in Lecture Notes in Computer Science 60, ed. R. Bayer, R. M. Graham and G. Seegmueller, pp. 393–481, Springer-Verlag, New York, N.Y., 1978.Google Scholar
  13. 13.
    G.W.R. Luderer, H. Che, J.P. Haggerty, P.A. Kirslis, and W.T. Marshall, “A Distributed Unix System Based on a Virtual Circuit Switch,” Proc. 8th Symp. Operating System Principles, pp. 160–168, ACM, Pacific Grove, California., December 1981. Also in: ACM Special Interest Group on Operating Systems — Operating Systems Review, Vol. 15(5) (December 1981).Google Scholar
  14. 14.
    J. M. Rushby and B. Randell, “A Distributed Secure System,” Computer, vol. 16, no. 7, IEEE, July 1983.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1986

Authors and Affiliations

  • B. Randell
    • 1
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK

Personalised recommendations