Fault Tolerant Systems

Part of the Springer Series in Reliability Engineering book series (RELIABILITY)


In the 21st century we seldom see any industry or service organization working without the help of an embedded software system. Such a dependence of mankind on software systems has made it necessary to produce software that is more and more reliable. Complex safety critical systems currently being designed and built are often difficult multi-disciplinary undertakings. Part of these systems is often a computer control system. In order to ensure that these systems perform without failure, even under extreme conditions, it is important to build extremely high reliability in them, both for hardware and software. There are many real life examples when failures in computer systems of safety critical systems have caused spectacular failure resulting in calamitous loss to life and economy.


Fault Tolerance Acceptance Test Common Fault Reliability Growth Testing Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Avizienis A, Kelly JPJ (1984) Fault tolerance by design diversity: concepts and experiments. IEEE Computer 17(8):67–80CrossRefGoogle Scholar
  2. 2.
    Ammann PE, Knight JC (1988) Data diversity: an approach to software fault tolerance. IEEE Trans Comput. 37(4):418–425CrossRefGoogle Scholar
  3. 3.
    Jalote P, Huang Y, Kintala C (1995) A framework for understanding and handling transient software failures. In: Proceedings of the 2nd ISSAT International Conference on Reliability and Quality in Design, Orlando, pp 231–237Google Scholar
  4. 4.
    Adams E (1994) Optimizing preventive service of the software products. IBM J R&D 28(1):2–14CrossRefGoogle Scholar
  5. 5.
    Lee I, Iyer RK (1995) Software dependability in the tandem GUARDIAN system. IEEE Trans Softw Eng 21(5):455–467Google Scholar
  6. 6.
    Avizienis A (1975) Fault-tolerance and fault-intolerance: complementary approaches to reliable computing. Presented at international conference on reliable software, Los Angeles, CaliforniaGoogle Scholar
  7. 7.
    Randell B (1975) System structure for software fault tolerance. IEEE Trans Softw Eng SE-1(2):220–232CrossRefGoogle Scholar
  8. 8.
    Chen L, Avizienis A (1978) N-version programming: a fault tolerance approach to the reliable software. In: Proceedings of the 8th international symposium fault-tolerant computing, Toulouse, pp 3–9Google Scholar
  9. 9.
    Leung YW (1995) Maximum likelihood voting for fault tolerant software with finite output spaces. IEEE Trans Reliability 44(3):419–426CrossRefGoogle Scholar
  10. 10.
    Horning JJ, Lauer HC, Melliar PM, Randell B (1974) A program structure for error detection and recovery. Lect Notes Comput Sci 16:177–193Google Scholar
  11. 11.
    Nicola VF, Goyal A (1990) Modeling of correlated failures and community error recovery in multi-version software. IEEE Trans Softw Eng 16(3):350–359CrossRefGoogle Scholar
  12. 12.
    Yau SS, Cheung RC (1975) Design of self-checking software. In: Proceedings of the international conference on reliable software, IEEE Computer Society Press, Los Angeles pp 450–457Google Scholar
  13. 13.
    Hecht M, Agron J, Hochhauser S (1989) A distributed fault tolerant architecture for nuclear reactor control and safety functions. In: Proceedings of the real-time system symposium, Santa Monica, pp 214–221Google Scholar
  14. 14.
    Scott RK, Gault JW, McAllister DF (1985) Fault tolerant software reliability modeling. IEEE Trans Softw Eng 13(5):582–592Google Scholar
  15. 15.
    Scott RK, Gault JW, McAllister DF (1987) Fault-tolerant reliability modeling. IEEE Trans Softw Eng SE-13(5):582–592CrossRefGoogle Scholar
  16. 16.
    Lyu MR (1995) Software fault tolerance. Wiley, New YorkGoogle Scholar
  17. 17.
    Belli F, Jedrzejowicz P (1990) Fault-tolerant programs and their reliability. IEEE Trans Reliability 29(2):184–192CrossRefGoogle Scholar
  18. 18.
    Ashrafi A, Berman O (1992) Optimal design of large software systems considering reliability and cost. IEEE Trans Reliability 41(2):281–287MATHCrossRefGoogle Scholar
  19. 19.
    Berman O, Ashrafi A (1993) Optimization models for reliability of modular software systems. IEEE Transactions on Software Engineering 19(11):1119–1123CrossRefGoogle Scholar
  20. 20.
    Kumar UD (1998) Reliability analysis of fault tolerant recovery blocks. OPSEARCH, J Oper Res Soc India 35(4):281–294MATHGoogle Scholar
  21. 21.
    Ashrafi A, Berman O, Cutler M (1994) Optimal design of large software systems using N-version programming. IEEE Trans Reliability 43(2):344–350CrossRefGoogle Scholar
  22. 22.
    Berman O, Kumar UD (1999) Optimization models for recovery block schemes. Eur J Oper Res 115:368–379MATHCrossRefGoogle Scholar
  23. 23.
    Kapur PK, Bardhan AK, Shatnawi O (2002) Why software reliability growth modeling should define errors of different severity. J Indian Stat Assoc 40(2):119–142MathSciNetGoogle Scholar
  24. 24.
    Scott RK, Gault JW, McAllister DF, Wiggs J (1984) Experimental validation of six fault-tolerant software reliability models. In: Proceedings of the IEEE 14th fault-tolerant computing symposium, pp 102–107Google Scholar
  25. 25.
    Eckhardt D, Lee L (1985) A theoretical basis for the analysis of multi-version software subject to coincident errors. IEEE Trans Softw Eng SE-11(12):1511–1517CrossRefGoogle Scholar
  26. 26.
    Littlewood B, Miller DR (1989) Conceptual modeling of coincident failures in multi-version software. IEEE Trans Softw Eng 15(12):1596–1614MathSciNetCrossRefGoogle Scholar
  27. 27.
    Dugan JB, Lyu MR (1994) System reliability analysis of an N-version programming application. IEEE Trans Reliability 43(4):513–519CrossRefGoogle Scholar
  28. 28.
    Kanoun K, Kaaniche M, Beounes C (1993) Reliability growth of fault-tolerant software. IEEE Trans Reliability 42(2):205–218MATHCrossRefGoogle Scholar
  29. 29.
    Chatterjee S, Misra RB, Alam SS (2004) N-version programming with imperfect debugging. Comput Electr Eng 30:453–463MATHGoogle Scholar
  30. 30.
    Kapur PK, Gupta A, Jha PC (2007) Reliability growth modeling and optimal release policy of a n-version programming system incorporating the effect of fault removal efficiency. Int J Autom Comput., Springer, Heidelberg 4(4):369–379Google Scholar
  31. 31.
    Teng X, Pham H (2002) A software reliability growth model for N-version programming systems. IEEE Trans Reliability 51(3):311–321CrossRefGoogle Scholar
  32. 32.
    Zhang XM, Jeske DR, Pham H (2002) Calibrating software reliability models when the test environment does not match the user environment. Appl Stoch Models Bus Indus 18:87–99MathSciNetCrossRefGoogle Scholar
  33. 33.
    Kapur PK, Kumar D, Gupta A, Jha PC (2006) On how to model software reliability growth in the presence of imperfect debugging and fault generation. In: Proceedings of the 2nd international conference on reliability and safety engineering, INCRESE, pp 261–268Google Scholar
  34. 34.
    Pham H (2006) System software reliability, Reliability Engineering Series. Springer, LondonGoogle Scholar
  35. 35.
    Kapur PK, Gupta A, Gupta D, Jha PC (2008) Optimum software release policy under fuzzy environment for a n-version programming system using a discrete software reliability growth model incorporating the effect of fault removal efficiency. Verma AK, Kapur PK, Ghadge SG (eds) Advances in performance and safety of complex system. Macmillan Advance Research Series, 803–816Google Scholar
  36. 36.
    Kapur PK, Jha PC, Bardhan AK (2002) Optimal component selection for fault tolerant COTS based software system. Presented at the international conference on operational research for development (ICORD’2002), Anna University, ChennaiGoogle Scholar

Copyright information

©  Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Department of Operational ResearchUniversity of DelhiDelhiIndia
  2. 2.Department of Industrial and Systems EngineeringRutgers UniversityPiscatawayUSA

Personalised recommendations