Advertisement

Software Fault Tolerance

  • Peter Alan Lee
  • Thomas Anderson
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 3)

Abstract

Fault tolerance techniques for coping with the occurrence and effects of anticipated hardware component failures are now well established and form a vital part of any reliable computing system. However, it is more unusual to find that strategies for fault tolerance have been included in a system for coping with design faults, although such strategies are becoming increasingly common in systems with high reliability requirements. For instance, applications in railway systems, nuclear reactor control and aircraft control are reported by Voges.1 Design faults may not have been a problem in hardware systems (or at least not recognized as such) but are of major concern in software systems.

Keywords

Primary Module Alternate Module Acceptance Test Design Fault Software Fault 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    U. Voges (ed.), Software Diversity in Computerized Control Systems, Springer-Verlag, Wien (1988).Google Scholar
  2. 2.
    J.G. Robinson and E.S. Roberts, “Software Fault-Tolerance in the Pluribus,” AFIPS Conference Proceedings 1978 NCC 47, Anaheim (CA), pp. 563–569 (June 1978).Google Scholar
  3. 3.
    J.H. Wensley et al., “SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings of the IEEE 66 (10), pp. 1240–1255 (October 1978).CrossRefGoogle Scholar
  4. 4.
    J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser ), Springer-Verlag, Berlin (1974).Google Scholar
  5. 5.
    T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).Google Scholar
  6. 6.
    P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C-29 (6), pp. 546–549 (June 1980).CrossRefGoogle Scholar
  7. 7.
    F. Cristian, “Exception Handling and Software-Fault Tolerance,” Digest of Papers FTCS-10: 10th International Symposium on Fault-Tolerant Computing Systems, Kyoto, pp. 97–103 (October 1980).Google Scholar
  8. 8.
    P.M. Melliar-Smith and B. Randell, “Software Reliability: The Role of Programmed Exception Handling,” SIGPLAN Notices 12 (3), pp. 95–100 (March 1977).CrossRefGoogle Scholar
  9. 9.
    D.E. Knuth, The Art of Computer Programming Vols.1–3, Addison-Wesley, Reading (MA) (1968).MATHGoogle Scholar
  10. 10.
    T. Gilb, “Distinct Software: A Redundancy Technique for Reliable Software,” pp. 117–133 in State of the Art Report on Software Reliability, Infotech, Maidenhead (1977).Google Scholar
  11. 11.
    H. Kopetz, “Software Redundancy in Real Time Systems,” IFIP Congress 74, Stockholm, pp. 182–186 (August 1974).Google Scholar
  12. 12.
    M.A. Fischler, O. Firschein, and D.L. Drew, “Distinct Software: An Approach to Reliable Computing,” Proceedings of Second USA-Japan Computer Conference, Tokyo, pp.573–579 (August 1975).Google Scholar
  13. 13.
    H. Hecht, “Fault Tolerant Software for Real-Time Applications,” Computing Surveys 8 (4), pp. 391–407 (December 1976).CrossRefMATHGoogle Scholar
  14. 14.
    A.B. Long et al., “A Methodology for the Development and Validation of Critical Software for Nuclear Power Plants,” Proceedings COMPSAC 77, Chicago (IL), pp. 620–626 (November 1977).Google Scholar
  15. 15.
    O.B. von Linde, “Computers Can Now Perform Vital Functions Safely,” Railway Gazette International 135 (11), pp. 1004–1006 (November 1979).Google Scholar
  16. 16.
    J.P.J. Kelly and A. Avizienis, “A Specification-Oriented Multi-Version Software Experiment,” Digest of Papers FTCS13: Thirteenth Annual International Symposium on Fault-Tolerant Computing, Milano, pp. 120–126 (June 1983).Google Scholar
  17. 17.
    T. Anderson et al., “Software Fault Tolerance: An Evaluation,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1502–1510 (December 1985).CrossRefGoogle Scholar
  18. 18.
    J.C. Knight and N.G. Leveson, “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering SE-12 (1), pp. 96–109 (January 1986).Google Scholar
  19. 19.
    D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1511–1517 (December 1985).CrossRefGoogle Scholar
  20. 20.
    B. Littlewood and D.R. Miller, “A Conceptual Model of the Effect of Diverse Methodologies on Coincident Failures in Multi-version Software,” pp. 321–333 in Measurement for Software Control and Assurance, (ed. B.A. Kitchenham and B. Littlewood ), Elsevier Applied Science (1989).Google Scholar
  21. 21.
    E. Best and F. Cristian, “Systematic Detection of Exception Occurrences,” Technical Report 165, Computing Laboratory, University of Newcastle upon Tyne (April 1981).Google Scholar
  22. 22.
    R.H. Campbell, K.H. Horton, and G.G. Belford, “Simulations of a Fault-Tolerant Deadline Mechanism,” Digest of Papers FTCS-9: Ninth Annual International Symposium on Fault-Tolerant Computing, Madison (WI), pp. 95–101 (June 1979).Google Scholar
  23. 23.
    E.J. Salzman, “An Experiment in Producing Highly Reliable Software,” M.Sc. Dissertation, Computing Laboratory, University of Newcastle upon Tyne (1978).Google Scholar
  24. 24.
    S.K. Shrivastava and A.A. Akinpelu, “Fault Tolerant Sequential Programming Using Recovery Blocks,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, p. 207 (June 1978).Google Scholar
  25. 25.
    H.O. Welch, “Distributed Recovery Block Performance in a Real-Time Control Loop,” Proceedings of Real-Time Systems Symposium, Arlington (VA), pp. 268–276 (1983).Google Scholar
  26. 26.
    A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1491–1501 (December 1985).CrossRefGoogle Scholar
  27. 27.
    L. Chen and A. Avizienis, „N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, pp. 3–9 (June 1978).Google Scholar
  28. 28.
    S.S. Brilliant, J.C. Knight, and N.G. Leveson, “The Consistent Comparison Problem in N-Version Software,” ACM SIGSOFT Software Engineering Notes 12 (1), pp. 29–34 (January 1987).CrossRefGoogle Scholar
  29. 29.
    A. Avizienis and L. Chen, “On the Implementation of N-Version Programming for Software Fault-Tolerance During Program Execution,” Proceedings COMPSAC 77, Chicago (IL), pp. 149–155 (November 1977).Google Scholar
  30. 30.
    J.C. Knight and N.G. Leveson, “An Empirical Study of Failure Probabilities in Multi-Version Software,” Digest of Papers FTCS-16: Sixteenth Annual International Symposium on Fault-Tolerant Computing, Wien, pp. 165–170 (July 1986).Google Scholar
  31. 31.
    A. Avizienis, “DEDIX 87–A Supervisory System for Design Diversity Experiments at UCLA,” pp. 129–168 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).Google Scholar
  32. 32.
    K.S. Tso and A. Avizienis, “Community Error Recovery in N-Version Software: A Design Study With Experimentation,” Digest of Papers FTCS-17: Seventeenth Annual International Symposium on Fault-Tolerant Computing, Pittsburgh, pp.127–133 (July 1987).Google Scholar
  33. 33.
    R.M. Sedmak and H.L. Liebergot, “Fault-Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29 (6), pp. 492–500 (June 1980).CrossRefGoogle Scholar
  34. 34.
    P. Traverse, “AIRBUS and ATR System Architecture and Specification,” pp. 95–104 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).Google Scholar
  35. 35.
    P.G. Bishop, “The PODS Diversity Experiment,” pp. 51–84 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).Google Scholar
  36. 36.
    J.R. Garman, “The Bug Heard Around The World,” ACM Software Engineering Notes 6 (5), pp. 3–10 (October 1981).CrossRefGoogle Scholar
  37. 37.
    G. Hagelin, “ERICSSON Safety System For Railway Control,” pp. 11–22 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).Google Scholar

Copyright information

© Springer-Verlag/Wien 1990

Authors and Affiliations

  • Peter Alan Lee
    • 1
  • Thomas Anderson
    • 1
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK

Personalised recommendations