Scheduling Fault Recovery Operations for Time-Critical Applications

  • Sandra Ramos-Thuel
  • Jay K. Strosnider
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 9)


This paper introduces algorithms for scheduling fault recovery operations on systems which must preserve the timing correctness of critical application tasks in the presence of faults. The algorithms are based on methods to reserve time for the processing of recovery tasks at the design stage. This allows recovery tasks to be scheduled with very low run-time overhead, complementing or reducing the need for hardware replication to support dependable operation. Although previous work has advocated the use of reservation methods, there exists no formal methodology for allocating such time. A methodology is developed which facilitates the difficult task of verifying the timing correctness of a desired reservation strategy. In addition, simulation results are presented which give insight into the effectiveness of different reservation strategies in averting timing failures under a variety of transient recovery loads.1


Inertial Navigation System Periodic Task Periodic Load Application Task Schedulability Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    P. Hood and V. Grover. Designing real-time systems in ada. Technical Report 1123-1, SofTech Inc., 460 Totten Pold Road, Waltham, MA 022540-9197, January 1986.Google Scholar
  2. [2]
    Kopetz Distributed fault-tolerant real-time systems: The mars approach. IEEE Micro, 9(1):25–40, February 1989.CrossRefGoogle Scholar
  3. [3]
    V. Nirkhe and W. Pugh. A partial evaluator for the maruti hard real-time system. In Real-Time Systems Symposium, pages 64-73, Dec. 1991.Google Scholar
  4. [4]
    T.B. Smith III. The Fault-Tolerant Multiprocessor Computer. Moyes Publications, 1986.Google Scholar
  5. [5]
    Daniel Mosse. A Framework for the Development and Deployment of Fault-Tolerant Applications in Real-Time Environments. PhD thesis, University of Maryland, College Park, MD., 1992.Google Scholar
  6. [6]
    A.L. Liestman and R.H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(11), November 1986.Google Scholar
  7. [7]
    A. Wei, K. Hiraishi, R. Cheng, and R. Campbell. Application of the fault-tolerant deadline mechanism to a satellite on-board computer system. In 1980 International Symposium on Fault-Tolerant Computing, pages 107-109, June 1980.Google Scholar
  8. [8]
    H. Hecht. Fault-tolerant software for real-time applications. ACM Computing Surveys, 8(4):391–407, December 1976.MATHCrossRefGoogle Scholar
  9. [9]
    C.M. Krishna and K.G. Shin. On scheduling tasks with a quick recovery from failure. IEEE Transactions on Computers, C-35(5):448–455, May 1986.CrossRefGoogle Scholar
  10. [10]
    S. Balaji, L. Jenkins, L.M. Patnaik, and P.S. Goel. Workload redistribution for fault-tolerance in a hard real-time distributed computing system. In 1989 International Symposium on Fault-Tolerant Computing, pages 366-373, Chicago, Illinois, June 1989.Google Scholar
  11. [11]
    R.H. Campbell, K.H. Horton, and G.C. Beiford. Simulations of a fault-tolerant deadline mechanism. In 1979 International Symposium on Fault-Tolerant Computing, pages 95-101, Madison, Wisconsin, June 1979.Google Scholar
  12. [12]
    J. Y.-T. Leung and J. Whitehead. On the complexity of fixed-priority scheduling of periodic real-time tasks. Performance Evaluation, 2:237–250, 1982.MathSciNetMATHCrossRefGoogle Scholar
  13. [13]
    C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the Association for Computing Machinery, 20(l):46–61, January 1973.MathSciNetMATHCrossRefGoogle Scholar
  14. [14]
    John Lehoczky, Lui Sha, and Ye Ding. The rate-monotonic scheduling algorithm: Exact characterization and average case behavior. In Real-Time Systems Symposium, pages 166-171, 1989.Google Scholar
  15. [15]
    Sandra Ramos Thuel. Enhancing Fault Tolerance of Real-Time Systems through Time Redundancy. PhD thesis, Carnegie Mellon University, May 1993.Google Scholar
  16. [16]
    K. Fowler. Inertial navigation system simulator: Top-level design. Technical Report CMU/SEI-89-TR-38, Software Engineering Institute, January 1989.Google Scholar
  17. [17]
    D. Locke, D. Vogel, and T.J. Mesler. Building a predictable avionics platform in ada: A case study. In Real-Time Systems Symposium, pages 181-189, Dec. 1991.Google Scholar
  18. [18]
    S. Sathaye, D. Katcher, and J. Strosnider. Fixed priority scheduling with limited priority levels. Technical Report CMU-CDS-92-7, Carnegie Mellon University, August 1992.Google Scholar
  19. [19]
    B. Randell. System structure for software fault tolerance. IEEE Transactions on Software Engineering, pages 220-232, June 1975.Google Scholar
  20. [20]
    Daniel P. Siewiorek and Robert S. Swarz. Reliable Computer Systems. Digital Press, 1992.Google Scholar
  21. [21]
    J.D. Musa. A theory of software reliability and its application. IEEE Transactions on Software Engineering, pages 312-327, September 1975.Google Scholar
  22. [22]
    W.G. Bouricius. Reliability modeling for fault-tolerant computers. IEEE Transactions on Computers, C-20:1306–1311, Nov. 1971.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag/Wien 1995

Authors and Affiliations

  • Sandra Ramos-Thuel
    • 1
  • Jay K. Strosnider
    • 1
  1. 1.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations