Advertisement

Fault Treatment and Continued Service

  • Peter Alan Lee
  • Thomas Anderson
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 3)

Abstract

By means of techniques for error detection, damage assessment and error recovery a fault tolerant system aims to ensure that any errors introduced into the system state are removed. If these techniques succeed in placing the system in an error free state, the system can return to normal operation since the immediate danger of failure has been averted. However, this may not be enough to ensure reliability. Measures and mechanisms employed in the first three phases of fault tolerance are (necessarily) concerned with errors in the system, but errors are merely the symptoms produced by a fault; techniques which cope with errors, such as those described in the previous chapter, leave the fault which produced those errors untreated. Given that a fault has already inflicted damage on the system state there is clearly a possibility that the fault will continue to produce errors. Repeated manifestations of a fault can force a system to fail despite the efforts of the fault tolerance techniques described so far, either because the consequences of the fault become more and more serious, or because the system is so heavily engaged in coping with recurring errors that it fails to maintain its specified service.

Keywords

Fault Tolerance Fault Location Fault Treatment Design Fault Transient Fault 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bell Laboratories, “LAMP: Logic Analyser for Maintenance Planning,” Bell System Technical Journal 53 (8), pp. 1431–1555 (October 1974).Google Scholar
  2. 2.
    J.R. Sklaroff, “Redundancy Management Technique for Space Shuttle Computers,” IBM Journal of Research and Development 20 (1), pp. 20–28 (January 1976).CrossRefGoogle Scholar
  3. 3.
    B.R. Borgerson, “Spontaneous Reconfiguration in a Fail Softly Computer Utility,” Data fair 73 Conference Papers, Nottingham, pp. 326–333 (April 1973).Google Scholar
  4. 4.
    B.H. Liskov and A. Snyder, “Exception Handling in CLU,” IEEE Transactions on Software Engineering SE-5 (6), pp. 546–558 (November 1979).Google Scholar
  5. 5.
    R.A. Levin, “Program Structures for Exceptional Condition Handling,” Ph.D. Thesis, Carnegie Mellon University, Pittsburgh (PA ) (1977).Google Scholar
  6. 6.
    J.B. Goodenough, “Exception Handling: Issues and a Proposed Notation,” Communications of the ACM 18 (12), pp. 683–696 (December 1975).CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    J.G. Mitchell, W. Maybury, and R. Sweet, “Mesa Language Manual (Version 5.0),” CSL-79–3, Xerox Palo Alto Research Center (CA) (April 1979).Google Scholar

Copyright information

© Springer-Verlag/Wien 1990

Authors and Affiliations

  • Peter Alan Lee
    • 1
  • Thomas Anderson
    • 1
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK

Personalised recommendations