Enhancing Fault Tolerance of Real-Time Systems through Time Redundancy

  • Sandra R. Thuel
  • Jay K. Strosnider
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 285)


Fault-tolerant, real-time systems require correct, time-constrained results in the presence of faults. Missed deadlines in many high dependability systems can result in significant property damage or loss of human life. Historically, designers relied almost exclusively upon massive hardware replication to achieve their dependability goals. Research suggests that not only is this approach inadequate for dealing with certain fault classes, but also that it is inappropriate for many applications with strict space, weight, and cost constraints. Alternatively, time redundancy can be used to complement replication as a means to improve fault coverage and reduce the required level of replication for fault-tolerant system design. Although previous work has advocated the use of time redundancy to provide protection against hardware and software faults, there exists no formal methodology for allocating and managing such time. This chapter provides an overview of recent work in developing a comprehensive analytical framework for allocating and managing time redundancy to preserve the timing correctness of priority-driven, real-time systems in the presence of faults.


Priority Level Periodic Task Recovery Operation Aperiodic Task Schedule Overhead 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Farnam Jahanian. State restoration in real-time fault-tolerant systems. Complex Systems Engineering Synthesis and Assessment Technology Workshop, pages 21–29, July 1992.Google Scholar
  2. [2]
    Tom Hand. Real-time systems need predictability. Computer Design RISC, Supplement:57–59, August 1989.Google Scholar
  3. [3]
    H. Kopetz, H. Kantz, G. Grunsteidl, P. Puschner, and J. Reisinger. Tolerating transient faults in mars. In International Symposium on Fault-Tolerant Computing, pages 466–473, NewCastle Upon Tyne, U.K., June 1990.Google Scholar
  4. [4]
    C.M. Krishna and A.D. Singh. Modelling correlated transient failures in fault-tolerant systems. In 1989 International Symposium on Fault-Tolerant Computing, pages 374–381, Chicago, Illinois, June 1989.Google Scholar
  5. [5]
    A.L. Hopkins, T.B. Smith III, and J.H. Lala. Ftmp — a highly reliable fault-tolerant multiprocessor for aircraft. Proceedings of the IEEE 66, pages 1221–1239, October 1978.Google Scholar
  6. [6]
    J. Goldberg et.al. Development and analysis of the software implemented fault-tolerance (sift) computer. Technical report, NASA CR-172146, 1984.Google Scholar
  7. [7]
    J.H. Wensley et.al. Sift: The design and analysis of a fault-tolerant computer for aircraft control. Proceedings of the IEEE 66, 66(10), October 1978.Google Scholar
  8. [8]
    J.H. Lala and L.S. Alger. Hardware and software fault tolerance: A unified architectural approach. In 1988 International Symposium on Fault-Tolerant Computing, pages 240–245, Tokyo, Japan, June 1988.Google Scholar
  9. [9]
    Y.K. Malaiya. Linearly correlated intermittent failures. IEEE Transactions on Reliability, R-31(2), 1982.Google Scholar
  10. [10]
    S.R. McConnel, D.P. Siewiorek, and M.M. Tsao. The measurement and analysis of transient errors in digital computing systems. In Digest of Papers, Ninth Annual International Conference on Fault-Tolerant Computing, pages 67–70, 1979.Google Scholar
  11. [11]
    Ting-Ting Y. Lin. Design and Evaluation of an On-line Predictive Diagnostic System. PhD thesis, Carnegie Mellon University, May 1988.Google Scholar
  12. [12]
    Jim Gray. Why do computers stop and what can be done about it? In Fifth Symposium on Reliability in Distributed Software and Database Systems, pages 374–381, Los Angeles, California, Jan. 1986.Google Scholar
  13. [13]
    Daniel P. Siewiorek. Architecture of fault-tolerant computers: An historical perspective. In Proceedings of the IEEE, volume 79, pages 1–25, December 1991.Google Scholar
  14. [14]
    B. Randell. System structure for software fault tolerance. IEEE Transactions on Sofware Engineering, pages 220–232, June 1975.Google Scholar
  15. [15]
    A. Avizienis and J. Kelly. Fault tolerance by design diversity: Concepts and experiments. IEEE Computer, August 1984.Google Scholar
  16. [16]
    L.J. Yount. Architectural solutions to safety problems for commercial transports. Proceedings of the 6th AIAA/IEEE Digital Avionics Systems Conference, December 1984.Google Scholar
  17. [17]
    G.F. Sullivan and G.M. Masson. Using certification trails to achieve software fault tolerance. In Proceedings of the IEEE 1990 Fault-Tolerant Computing Symposium, pages 423–431, 1990.Google Scholar
  18. [18]
    G.F. Sullivan and G.M. Masson. Certification trails for data structures. Technical Report JHU 90/17, John Hopkins University, MD., 1990.Google Scholar
  19. [19]
    P. Hood and V. Grover. Designing real-time systems in ada. Technical Report 1123-1, SofTech Inc., 460 Totten Pold Road, Waltham, MA 022540-9197, January 1986.Google Scholar
  20. [20]
    Kopetz et.al. Distributed fault-tolerant real-time systems: The mars approach. IEEE Micro, 9(1):25–40, February 1989.CrossRefGoogle Scholar
  21. [21]
    V. Nirkhe and W. Pugh. A partial evaluator for the maruti hard real-time system. In Real-Time Systems Symposium, pages 64–73, Dec. 1991.Google Scholar
  22. [22]
    J. Stankovic and K. Ramamritham. The spring kernel: A new paradigm for real-time operating systems. ACM Operating Systems Review, 23(3), July 1989.Google Scholar
  23. [23]
    T.B. Smith III. The Fault-Tolerant Multiprocessor Computer. Moyes Publications, 1986.Google Scholar
  24. [24]
    James Gafford. Rate monotonic scheduling. IEEE Micro, pages 34–38, June 1991.Google Scholar
  25. [25]
    Sandra Ramos Thuel. Enhancing Fault Tolerance of Real-Tine Systems through Time Redundancy. Ph D thesis, Carnegie Mellon University, May 1993.Google Scholar
  26. [26]
    H. Chetto and M. Chetto. Some results of the earliest deadline scheduling algorithm. IEEE Transactions on SW Eng., 15(10):466–473, 1989.Google Scholar
  27. [27]
    K. Schwan and H. Zhou. Dynamic scheduling of hard real-time tasks and real-time threads. IEEE Transactions on SW Eng., 18(8):736–748, 1992.CrossRefGoogle Scholar
  28. [28]
    C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the Association for Computing Machinery, 20(l):46–61, January 1973.MATHGoogle Scholar
  29. [29]
    J. Y.-T. Leung and J. Whitehead. On the complexity of fixed-priority scheduling of periodic real-time tasks. Performance Evaluation, 2:237–250, 1982.CrossRefGoogle Scholar
  30. [30]
    John Lehoczky, Lui Sha, and Ye Ding. The rate-monotonic scheduling algorithm: Exact characterization and average case behavior. In Real-Time Systems Symposium, pages 166–171, 1989.Google Scholar
  31. [31]
    Sandra Ramos-Thuel and Jay K. Strosnider. Scheduling fault recovery operations for time-critical applications. In Proceedings of Dependable Computing for Critical Applications, January 1994.Google Scholar
  32. [32]
    Sandra Ramos-Thuel and John P. Lehoczky. An optimal algorithm for scheduling soft-aperiodic tasks in fixed-priority preemptive systems. In Real-Time Systems Symposium, pages 100–110, December 1992.Google Scholar
  33. [33]
    Sandra Ramos-Thuel and John P. Lehoczky. On-line scheduling of hard deadline aperiodic tasks in fixed-priority systems. In Proceedings of the Real-Time Systems Symposium, pages 160–171, December 1993.Google Scholar
  34. [34]
    Edward C. Russell. Building Simulation Models with SIMSCRIPT II.5. CACI Inc., 1983.Google Scholar
  35. [35]
    W.G. Bouricius. Reliability modeling for fault-tolerant computers. IEEE Transactions on Computers, C-20:1306–1311, Nov. 1971.Google Scholar
  36. [36]
    K. Fowler. Inertial navigation system simulator: Top-level design. Technical Report CMU/SEI-89-TR-38, Software Engineering Institute, January 1989.Google Scholar
  37. [37]
    W. Stallings. Data and Computer Communications. Macmillan, N.Y., N.Y., 1985.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Sandra R. Thuel
    • 1
  • Jay K. Strosnider
    • 2
  1. 1.AT&0026;T Bell LaboratoriesHolmdel
  2. 2.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburgh

Personalised recommendations