Advertisement

A Linguistic Approach to Failure Handling in Distributed Systems

  • Richard D. Schlichting
  • Flaviu Cristian
  • Titus D. M. Purdin
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 4)

Abstract

Distributed computer systems are increasingly being used for controlling critical applications. An important aspect to constructing dependable systems for such use is ensuring that the system software is robust to failures in the underlying computing platform. One property that makes failures difficult to handle in this environment is that they can occur concurrently with other system events. This paper describes a language-based approach for constructing system software that can cope with such asynchrony in a systematic manner. The basic idea is to treat failures as just another class of events that are handled similarly to normal events. Linguistic constructs that can be added to distributed programming languages with minimal impact are then proposed to handle such failure events. To make our ideas precise, we use the SR distributed programming language as a basis for incorporating these constructs. The approach is illustrated by a detailed presentation in the extended SR language of a replicated directory management program.

Keywords

Stable Storage Directory Manager Binding Variable Event Handler Event Description 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Reference Manual for the Ada Programming Language. ANSI/MBL--STD--1815A, 22 January 1983.Google Scholar
  2. [2]
    G.R. Andrews, R.D. Schlichting, R. Hayes and T. Purdin, “The design of the Saguaro distributed operating system”, IEEE Transactions on Software Engineering SE-13, 1 (Jan. 1987), 104–118.CrossRefGoogle Scholar
  3. [3]
    G.R. Andrews, R. Olsson et al., “An overview of the SR language and implementation”, ACM Trans. on Prog. Lang. and Syst. 10, 7 (Jan. 1988), 51–86.CrossRefGoogle Scholar
  4. [4]
    D.R. Cheriton and W. Zwaenepoel, “Distributed process groups in the V kernel”, ACM Trans. on Comp. Sys. 3, 2 (May 1985), 77–107.CrossRefGoogle Scholar
  5. [5]
    F. Cristian, “A rigorous approach to fault-tolerant programming”, IEEE Transactions on Software Engineering SE-11, 1 (Jan. 1985), 23–31.CrossRefGoogle Scholar
  6. [6]
    F. Cristian, H. Aghili, R. Strong and D. Dolev, “Atomic broadcast: From simple message diffusion to byzantine agreement”, Proc. 15th Annual International Symposium on Fault-Tolerant Computing, Ann Arbor, Michigan (June 1985), 404–409.Google Scholar
  7. [7]
    F. Cristian, “Agreeing on who is present and who is absent in a synchronous distributed system”, Proc. 18th Annual International Symposium on Fault-Tolerant Computing, Tokyo (June 1988), 206–211.Google Scholar
  8. [8]
    C.A.R. Hoare, “Communicating Sequential Processes, ” Comm. ACM 21, 8 (Aug. 1978), 666–677.MathSciNetMATHCrossRefGoogle Scholar
  9. [9]
    B.W. Lampson, “Atomic transactions”, In Distributed Systems--Architecture and Implementation. Lecture Notes in Computer Science Vol. 105, Springer-Verlag, New York, 1981, Chapter 11.Google Scholar
  10. [10]
    B. Liskov and R. Scheifler, “Guardians and Actions: Linguistic support for robust, distributed programs”, Proc. 9th Symp. on Prin. of Programming Languages, Austin, TX (Jan. 1983), 7–19.Google Scholar
  11. [11]
    J. Mitchell, W. Maybury and R. Sweet, “Mesa language manual”, Version 5.0. Report CSL-79-3, Xerox PARC, April 1979.Google Scholar
  12. [12]
    R.D. Schlichting and F.B. Schneider, “Fail-stop processors: An approach to designing fault-tolerant computing systems”, ACM Trans. on Comp. Sys. 1, 3 (Aug. 1983), 222–238.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag/Wien 1991

Authors and Affiliations

  • Richard D. Schlichting
    • 1
  • Flaviu Cristian
    • 2
  • Titus D. M. Purdin
    • 3
  1. 1.Department of Computer ScienceThe University of ArizonaTucsonUSA
  2. 2.IBM Almaden Research CenterSan JoseUSA
  3. 3.Department of Management Information SystemsThe University of ArizonaTucsonUSA

Personalised recommendations