Journal of Scheduling

, Volume 12, Issue 5, pp 501–515 | Cite as

Algorithms for testing fault-tolerance of sequenced jobs

  • Marek Chrobak
  • Mathilde Hurand
  • Jiří Sgall
Open Access


We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately.

First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k.

Then we consider the model in which any two faults are separated in time by a gap of length at least Δ, where Δ is at least twice the maximum job length. For exposed faults, we give an O(n)-time algorithm. For hidden faults, we give an algorithm with running time O(n 2), and we prove that if job lengths are distributed uniformly over an interval [0,p max ], then this algorithm’s expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an Ω(n 2) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.


Scheduling Fault-tolerance Real-time systems Algorithms 


  1. Aydin, H. (2004). On fault-sensitive feasibility analysis of real-time task sets. In Proceedings of the 25th IEEE international real-time systems symposium (RTSS’04) (pp. 426–434). Washington: IEEE Computer Society. CrossRefGoogle Scholar
  2. Ben-Or, M. (1983). Lower bounds for algebraic computation trees. In Proceedings of the 15th ACM symposium on theory of computing (STOC) (pp. 80–86). Google Scholar
  3. Egan, E., Kutz, D., Mikulin, D., Melhem, R., & Mossé, D. (1999). Fault-tolerant RT-Mach (FT-RT-Mach) and an application to real-time train control. Software: Practice and Experience, 29, 379–395. CrossRefGoogle Scholar
  4. Girault, A., Kalla, H., & Sorel, Y. (2004). A scheduling heuristics for distributed real-time embedded systems tolerant to processor and communication media failures. International Journal of Production Research, 42(14), 2877–2898. CrossRefGoogle Scholar
  5. Ghosh, S., Melhem, R., & Mossé, D. (1995). Enhancing real-time schedules to tolerate transient faults. In Proceedings IEEE real-time systems symposium (pp. 120–129). Google Scholar
  6. Ghosh, S., Melhem, R., & Mossé, D. (1997). Fault-tolerance through scheduling of aperiodic tasks in hard-real time multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems, 8, 272–284. CrossRefGoogle Scholar
  7. Kalyanasundaram, B., & Pruhs, K. (1997). Fault-tolerant real-time scheduling. In Proceedings of the 5th European symposium on algorithms (ESA) (pp. 296–307). Google Scholar
  8. Liberato, F., Lauzac, S., Melhem, R., & Mossé, D. (1999). Fault tolerant real-time global scheduling on multiprocessors. In Proceedings of the Euromicro workshop in real-time systems. Google Scholar
  9. Liberato, F., Melhem, R., & Mosse, D. (2000). Tolerance to multiple transient faults for aperiodic tasks in hard real-time systems. IEEE Transactions on Computers, 49, 906–914. CrossRefGoogle Scholar
  10. Manimaran, G., & Siva Ram Murthy, C. (1998). A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Transactions on Parallel and Distributed Systems, 9(11), 1137–1152. CrossRefGoogle Scholar
  11. Mosse, D., Melhem, R., & Ghosh, S. (2003). A nonpreemptive real-time scheduler with recovery from transient faults and its implementation. IEEE Transactions on Software Engineering, 29, 752–767. CrossRefGoogle Scholar
  12. Qi, X., Jiang, H., & Swanson, D. R. (2002). An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogenous systems. In Proceedings of the 13th international conference on parallel processing (pp. 360–368). Google Scholar
  13. Qin, X., Han, Z., Jin, H., Pang, L., & Li, S. (2000). Realtime fault-tolerant scheduling in heterogeneous distributed systems. In Proceedings of the international conference on parallel and distributed processing techniques and applications (pp. 421–427). Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CaliforniaRiversideUSA
  2. 2.Department d’Informatique (LIX)Ecole PolytechniquePalaiseauFrance
  3. 3.Department of Applied MathematicsCharles UniversityPraha 1Czech Republic

Personalised recommendations