Algorithms for testing fault-tolerance of sequenced jobs
We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately.
First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k.
Then we consider the model in which any two faults are separated in time by a gap of length at least Δ, where Δ is at least twice the maximum job length. For exposed faults, we give an O(n)-time algorithm. For hidden faults, we give an algorithm with running time O(n 2), and we prove that if job lengths are distributed uniformly over an interval [0,p max ], then this algorithm’s expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an Ω(n 2) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.
KeywordsScheduling Fault-tolerance Real-time systems Algorithms
- Ben-Or, M. (1983). Lower bounds for algebraic computation trees. In Proceedings of the 15th ACM symposium on theory of computing (STOC) (pp. 80–86). Google Scholar
- Ghosh, S., Melhem, R., & Mossé, D. (1995). Enhancing real-time schedules to tolerate transient faults. In Proceedings IEEE real-time systems symposium (pp. 120–129). Google Scholar
- Kalyanasundaram, B., & Pruhs, K. (1997). Fault-tolerant real-time scheduling. In Proceedings of the 5th European symposium on algorithms (ESA) (pp. 296–307). Google Scholar
- Liberato, F., Lauzac, S., Melhem, R., & Mossé, D. (1999). Fault tolerant real-time global scheduling on multiprocessors. In Proceedings of the Euromicro workshop in real-time systems. Google Scholar
- Qi, X., Jiang, H., & Swanson, D. R. (2002). An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogenous systems. In Proceedings of the 13th international conference on parallel processing (pp. 360–368). Google Scholar
- Qin, X., Han, Z., Jin, H., Pang, L., & Li, S. (2000). Realtime fault-tolerant scheduling in heterogeneous distributed systems. In Proceedings of the international conference on parallel and distributed processing techniques and applications (pp. 421–427). Google Scholar