Fault Tolerance

  • Peter Alan Lee
  • Thomas Anderson
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 3)


Reliability is a desirable feature for any computing system, and a necessary requirement for some systems, as discussed in Chapter 1. Although operation without failure is the goal, it cannot be guaranteed that a system will be free from faults and their effects during its operational lifetime. Even in the absence of financial considerations, quality assurance cannot guarantee that system components do not fail, and fault prevention is unlikely to succeed completely in eliminating design faults from a complex system. In order to provide reliability despite the presence of faults, measures for fault tolerance must be adopted.


Fault Tolerance Software Reliability Fault Treatment Design Fault Error Recovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R.A. Short, “The Attainment of Reliable Digital Systems Through the Use of Redundancy–A Survey,” IEEE Computer Group News 2 (2), pp. 2–17 (March 1968).Google Scholar
  2. 2.
    R.E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, Holt Rinehart and Winston, New York (1975).MATHGoogle Scholar
  3. 3.
    J.F. Wakerly, “Microcomputer Reliability Improvement Using Triple-Modular Redundancy,” Proceedings of the IEEE 64 (6), pp. 889–895 (June 1976).CrossRefGoogle Scholar
  4. 4.
    P.W. Bowman et al., “Maintenance Software,” Bell System Technical Journal 56 (2), pp. 255–287 (February 1977).Google Scholar
  5. 5.
    J.C. Laprie, “Dependability Evaluation: Hardware and Software,” pp. 44–67 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).Google Scholar
  6. 6.
    A. Avizienis, “Fault-Tolerance: The Survival Attribute of Digital Systems,” Proceedings of the IEEE 66 (10), pp.1109–1125 (October 1978).CrossRefGoogle Scholar
  7. 7.
    D. Swearingen and J. Donahas, “Quantitive Software Reliability Models–Data Parameters: A Tutorial,” Workshop on Quantitative Software Models, Kiamesha Lake (NY), pp. 143–153 (October 1979).Google Scholar
  8. 8.
    J.-C. Rault, “The Many Facets of Quantitative Assessment of Software Reliability,” Workshop on Quantitative Software Models, Kiamesha Lake (NY), pp. 224–231 (October 1979).Google Scholar
  9. 9..
    Littlewood, “Forecasting Software Reliability,” pp. 141–209 in Software Reliability: Modelling and Identification, Lecture Notes in Computer Science 341, (ed. S. Bittatni ), Springer-Verlag, Berlin (1989).Google Scholar
  10. 10.
    B. Littlewood, “A Bayesian Differential Debugging Model for Software Reliability,” Workshop on Quantitative Software Models, Kiamesha Lake (NY), pp. 170–181 (October 1979).Google Scholar
  11. 11.
    B. Littlewood, “How to Measure Software Reliability and How Not To,” IEEE Transactions on Reliability R-28 (2), pp.103110 (June 1979).Google Scholar
  12. 12.
    A.N. Sukert, “Empirical Validation of Three Software Error Prediction Models,” IEEE Transactions on Reliability R-28 (3), pp. 199–204 (August 1979).Google Scholar
  13. 13.
    B.H. Liskov and A. Snyder, “Exception Handling in CLU,” IEEE Transactions on Software Engineering SE-5 (6), pp.546558 (November 1979).Google Scholar
  14. 14.
    F. Cristian, “Exception Handling and Software-Fault Tolerance,” Digest of Papers FTCS-10: 10th International Symposium on Fault-Tolerant Computing Systems, Kyoto, pp. 97–103 (October 1980).Google Scholar
  15. 15.
    R.A. Levin, “Program Structures for Exceptional Condition Handling,” Ph.D. Thesis, Carnegie Mellon University, Pittsburgh (PA ) (1977).Google Scholar
  16. 16.
    D.L. Parnas, “On a Buzzword: Hierarchical Structure,” IFIP Congress 74, Stockholm, pp. 336–339 (August 1974).Google Scholar
  17. 17.
    J.B. Goodenough, “Exception Handling: Issues and a Proposed Notation,” Communications of the ACM 18 (12), pp. 683–696 (December 1975).CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    D.C. Luckham and W. Polak, “Ada Exception Handling: An Axiomatic Approach,” ACM Transactions on Programming Languages and Systems 2 (2), pp. 225–233 (April 1980).CrossRefMATHGoogle Scholar
  19. 19.
    M.D. maclaren, “Exception Handling in PL/I,” SIGPLAN Notices 12 (3), pp. 101–104 (March 1977).CrossRefGoogle Scholar
  20. 20.
    J.G. Mitchell, W. Maybury, and R. Sweet, “Mesa Language Manual (Version 5.0),” CSL-79–3, Xerox Palo Alto Research Center (CA) (April 1979).Google Scholar
  21. 21.
    P.A. Lee, “Exception Handling in C Programs,” Software: Practice and Experience 13 (5), pp. 389–405 (May 1983).CrossRefMATHGoogle Scholar
  22. 22.
    C. Schaffert et al., “An Introduction to Trellis/Owl,” Proc. ACM Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), ACM SIGPLAN Notices 21 (11), pp. 9–16 (November 1986).Google Scholar
  23. 23.
    F. Cristian, “Exception Handling,” pp. 68–97 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).Google Scholar

Copyright information

© Springer-Verlag/Wien 1990

Authors and Affiliations

  • Peter Alan Lee
    • 1
  • Thomas Anderson
    • 1
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK

Personalised recommendations