Abstract
In the 1970 s until early 1990 s a huge amount of work on modelling checkpointing has been published and we will cover in this section the most important models, insights, algorithms and results. Models of checkpointing differ in the granularity at which the environment is included in the model, if at all. They also differ in which components of the model are considered deterministic and which components are modelled as random variables, and what system characteristics are included at all. Some models allow for checkpoints being taken and possible failures during recovery, some allow for one of the two or none. Some models assume checkpoints to be equidistant, others consider them to be taken at random time intervals. There are many ways to organise and structure existing work in modelling of checkpointing. We will use the structure given in [113], where checkpointing schemes are divided in system level and program level checkpointing. In [113] only program level checkpointing is regarded in detail. We will here summarise existing work in both fields.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
For a discussion of convexity in checkpointing see [30]
Reference
K.M. Chandy, C.V. Ramamoorthy, Rollback and recovery strategies for computer programs. IEEE Trans. Comput. C-21(6), 546–556 (1972)
R. Geist, R. Reynolds, J. Westall, Selection of a checkpoint in a critical-task environment. IEEE Trans. Reliab. 37(4), 395–400 (1988)
P. L’Ecuyer, J. Malenfant, Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Trans. Comput. 37(4), 491–496 (1988)
T. Dohi, N. Kaio, K.S. Trivedi, Availability Models with Age-Dependent Checkpointing. In SRDS’02: Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems, Suita, Japan (IEEE Computer Society, Washington, DC, 2002), pp. 130–139
E.G. Coffman Jr., E.N. Gilbert, Optimal strategies for scheduling checkpoints and preventive maintenance. IEEE Trans. Reliab. 39(1), 9–18 (1990)
A. Duda, The effects of checkpointing on program execution time. Inform. Process. Lett. 6(5), 221–229 (1983)
M.J. Magazine, Optimality of intuitive checkpointing policies. Inform. Process. Lett. 17(2), 63–66 (1983)
A.N. Tantawi, M. Ruschitzka, Performance analysis of checkpointing strategies. ACM Trans. Comput. Syst. 2(2), 123–144 (1984)
V. Grassi, L. Donatiello, S. Tucci, On the optimal checkpointing of critical tasks and transaction-oriented systems. Trans. Software Eng. 18(1), 72–77 (1992)
J. Hong, S. Kim, Y. Cho, Cost analysis of optimistic recovery model for forked checkpointing. IEICE Trans. Inform. Syst. E-85A(1), 1534–1541 (2002)
V.F. Nicola, Chapter 7: Checkpointing and the modeling of program execution time, in Software Fault Tolerance, ed. by M.R. Lyu. Trends in Software, vol. 3 (Wiley, Chichester, 1995), pp. 167–188
S. Toueg, Ö. Babaoglu, On the optimum checkpoint selection problem. SIAM J. Comput. 13(3), 630–649 (1984)
L. Wang, K. Pattabiraman, Z. Kalbarczyk, R.K. Iyer, L. Votta, C. Vick, A. Wood, Modeling Coordinated Checkpointing for Large-Scale Supercomputers. In DSN’05: Proceedings of the Dependable Systems and Networks, Yokohama, Japan (IEEE Computer Society, Washington, DC, 2005), pp. 812–821
E. Gelenbe, D. Derochette, Performance of rollback recovery systems under intermittent failures. Commun. ACM 21(6), 493–499 (1978)
J.W. Young, A first order approximation to the optimum checkpoint interval. Commun. ACM 17(9), 530–531 (1974)
N.H. Vaidya, Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Trans. Comput. 46(8), 942–947 (1997)
A. Brock, An analysis of checkpointing. ICL Tech. J. 1(3), 211–228 (1979)
R. Koo, S. Toueg, Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Software Eng. SE-13(1), 23–31 (1987)
A. Ziv, J. Bruck, An on-line algorithm for checkpoint placement. IEEE Trans. Comput. 46(9), 976–985 (1997)
E. Gelenbe, On the optimum checkpoint interval. J. ACM 26(2), 259–270 (1979)
E. Gelenbe, M. Hernández, Optimum checkpoints with age dependent failures. Acta Informatica 27, 519–531 (1990)
C.H.C. Leung, Q.H. Choo, On the execution of large batch programs in unreliable computing systems. IEEE Trans. Software Eng. 10(4), 444–450 (1984)
C.M. Krishna, Y.-H. Lee, K.G. Shin, Optimization criteria for checkpoint placement. Commun. ACM 27(10), 1008–1012 (1984)
K.M. Chandy, J.C. Browne, C.W. Dissly, W.R. Uhrig, Analytic models for rollback and recovery strategies in data base systems. IEEE Trans. Software Eng. SE-1(1), 100–110 (1975)
F. Baccelli, Analysis of a service facility with periodic checkpointing. Acta Informatica 15, 67–81 (1981)
A. Goyal, V.F. Nicola, A.N. Tantawi, K.S. Trivedi, Reliability of systems with limited repairs. IEEE Trans. Reliab. R-36(2), 202–207 (1987)
V.G. Kulkarni, V.F. Nicola, K.S. Trivedi, Effects of checkpointing and queueing on program performance. Commun. Stochastic Models 6(4), 615–648 (1990)
K.G. Shin, T.H. Lin, Y.H. Lee, Optimal checkpointing of real-time tasks. IEEE Trans. Comput. C-36(11), 1328–1341 (1987)
K.M. Chandy, A survey of analytic models of rollback and recovery strategies. Computer 8(5), 40–47 (1975)
K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications (Wiley, New York, 2001)
K. Kant, A Global Checkpointing Model for Error Recovery. In AFIPS’83: Proceedings of the National Computer Conference, Anaheim, CA, 16–19 May, 1983 (ACM Press, New York, 1983), pp. 81–89
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wolter, K. (2010). Stochastic Models for Checkpointing. In: Stochastic Models for Fault Tolerance. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11257-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-11257-7_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11256-0
Online ISBN: 978-3-642-11257-7
eBook Packages: Computer ScienceComputer Science (R0)