Performance Evaluation of a Two Level Error Recovery Scheme for Distributed Systems
Rollback recovery schemes are used in fault-tolerant distributed systems to minimize the computation loss incurred in the presence of failures. One-level recovery schemes do not consider the different types of failures and their relative frequency of occurrence, thereby tolerating all failures with the same overhead. Two-level recovery schemes aim to provide low overhead protection against more probable failures, providing protection against other failures with possibly higher overhead. In this paper, we have analyzed a two-level recovery scheme due to Vaidya taking probability of task completion on a system with limited repairs as the performance metric.
Unable to display preview. Download preview PDF.
- 1.K.M. Chandy, J.C. Browne, C.W. Dissly, and W.R. Uhrig, Analytic Models for Rollback and Recovery Strategies in Data Base Systems, IEEE Trans. Software Eng, 1 (1975)100–110.Google Scholar
- 2.S. Garg and K.F. Wong, Analysis of an improved Distributed Checkpointing Algorithm, Technical Report WUCS-93-37, Dept. of Computer Science, Washington Univ., June 1993.Google Scholar
- 3.E. Gelenbe, A Model for Roll-Back Recovery with Multiple Checkpoints, Proc. Second Int’l Conf. Software Eng., (1976)251–255.Google Scholar
- 4.E. Gelenbe, Model of Information Recovery Using the Method of Multiple Checkpointing, Automation and Control, 4 (1976)251–255.Google Scholar
- 5.V.F. Nicola, Checkpointing and the Modeling of Program Execution time, Software fault Tolerance, in: M.R. Lyu Ed. John Wiley & Sons, (1995)167–188.Google Scholar