Skip to main content

Stochastic Models for Checkpointing

  • Chapter
  • First Online:
  • 540 Accesses

Abstract

In the 1970 s until early 1990 s a huge amount of work on modelling checkpointing has been published and we will cover in this section the most important models, insights, algorithms and results. Models of checkpointing differ in the granularity at which the environment is included in the model, if at all. They also differ in which components of the model are considered deterministic and which components are modelled as random variables, and what system characteristics are included at all. Some models allow for checkpoints being taken and possible failures during recovery, some allow for one of the two or none. Some models assume checkpoints to be equidistant, others consider them to be taken at random time intervals. There are many ways to organise and structure existing work in modelling of checkpointing. We will use the structure given in [113], where checkpointing schemes are divided in system level and program level checkpointing. In [113] only program level checkpointing is regarded in detail. We will here summarise existing work in both fields.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For a discussion of convexity in checkpointing see [30]

Reference

  1. K.M. Chandy, C.V. Ramamoorthy, Rollback and recovery strategies for computer programs. IEEE Trans. Comput. C-21(6), 546–556 (1972)

    Article  MathSciNet  Google Scholar 

  2. R. Geist, R. Reynolds, J. Westall, Selection of a checkpoint in a critical-task environment. IEEE Trans. Reliab. 37(4), 395–400 (1988)

    Article  MATH  Google Scholar 

  3. P. L’Ecuyer, J. Malenfant, Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Trans. Comput. 37(4), 491–496 (1988)

    Article  Google Scholar 

  4. T. Dohi, N. Kaio, K.S. Trivedi, Availability Models with Age-Dependent Checkpointing. In SRDS’02: Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems, Suita, Japan (IEEE Computer Society, Washington, DC, 2002), pp. 130–139

    Google Scholar 

  5. E.G. Coffman Jr., E.N. Gilbert, Optimal strategies for scheduling checkpoints and preventive maintenance. IEEE Trans. Reliab. 39(1), 9–18 (1990)

    Article  MATH  Google Scholar 

  6. A. Duda, The effects of checkpointing on program execution time. Inform. Process. Lett. 6(5), 221–229 (1983)

    Article  MathSciNet  Google Scholar 

  7. M.J. Magazine, Optimality of intuitive checkpointing policies. Inform. Process. Lett. 17(2), 63–66 (1983)

    Article  MathSciNet  Google Scholar 

  8. A.N. Tantawi, M. Ruschitzka, Performance analysis of checkpointing strategies. ACM Trans. Comput. Syst. 2(2), 123–144 (1984)

    Article  Google Scholar 

  9. V. Grassi, L. Donatiello, S. Tucci, On the optimal checkpointing of critical tasks and transaction-oriented systems. Trans. Software Eng. 18(1), 72–77 (1992)

    Article  Google Scholar 

  10. J. Hong, S. Kim, Y. Cho, Cost analysis of optimistic recovery model for forked checkpointing. IEICE Trans. Inform. Syst. E-85A(1), 1534–1541 (2002)

    Google Scholar 

  11. V.F. Nicola, Chapter 7: Checkpointing and the modeling of program execution time, in Software Fault Tolerance, ed. by M.R. Lyu. Trends in Software, vol. 3 (Wiley, Chichester, 1995), pp. 167–188

    Google Scholar 

  12. S. Toueg, Ö. Babaoglu, On the optimum checkpoint selection problem. SIAM J. Comput. 13(3), 630–649 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  13. L. Wang, K. Pattabiraman, Z. Kalbarczyk, R.K. Iyer, L. Votta, C. Vick, A. Wood, Modeling Coordinated Checkpointing for Large-Scale Supercomputers. In DSN’05: Proceedings of the Dependable Systems and Networks, Yokohama, Japan (IEEE Computer Society, Washington, DC, 2005), pp. 812–821

    Google Scholar 

  14. E. Gelenbe, D. Derochette, Performance of rollback recovery systems under intermittent failures. Commun. ACM 21(6), 493–499 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  15. J.W. Young, A first order approximation to the optimum checkpoint interval. Commun. ACM 17(9), 530–531 (1974)

    Article  MATH  Google Scholar 

  16. N.H. Vaidya, Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Trans. Comput. 46(8), 942–947 (1997)

    Article  Google Scholar 

  17. A. Brock, An analysis of checkpointing. ICL Tech. J. 1(3), 211–228 (1979)

    MathSciNet  Google Scholar 

  18. R. Koo, S. Toueg, Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Software Eng. SE-13(1), 23–31 (1987)

    Article  Google Scholar 

  19. A. Ziv, J. Bruck, An on-line algorithm for checkpoint placement. IEEE Trans. Comput. 46(9), 976–985 (1997)

    Article  MathSciNet  Google Scholar 

  20. E. Gelenbe, On the optimum checkpoint interval. J. ACM 26(2), 259–270 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  21. E. Gelenbe, M. Hernández, Optimum checkpoints with age dependent failures. Acta Informatica 27, 519–531 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. C.H.C. Leung, Q.H. Choo, On the execution of large batch programs in unreliable computing systems. IEEE Trans. Software Eng. 10(4), 444–450 (1984)

    Article  Google Scholar 

  23. C.M. Krishna, Y.-H. Lee, K.G. Shin, Optimization criteria for checkpoint placement. Commun. ACM 27(10), 1008–1012 (1984)

    Article  Google Scholar 

  24. K.M. Chandy, J.C. Browne, C.W. Dissly, W.R. Uhrig, Analytic models for rollback and recovery strategies in data base systems. IEEE Trans. Software Eng. SE-1(1), 100–110 (1975)

    Article  Google Scholar 

  25. F. Baccelli, Analysis of a service facility with periodic checkpointing. Acta Informatica 15, 67–81 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  26. A. Goyal, V.F. Nicola, A.N. Tantawi, K.S. Trivedi, Reliability of systems with limited repairs. IEEE Trans. Reliab. R-36(2), 202–207 (1987)

    Article  Google Scholar 

  27. V.G. Kulkarni, V.F. Nicola, K.S. Trivedi, Effects of checkpointing and queueing on program performance. Commun. Stochastic Models 6(4), 615–648 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  28. K.G. Shin, T.H. Lin, Y.H. Lee, Optimal checkpointing of real-time tasks. IEEE Trans. Comput. C-36(11), 1328–1341 (1987)

    Article  Google Scholar 

  29. K.M. Chandy, A survey of analytic models of rollback and recovery strategies. Computer 8(5), 40–47 (1975)

    Article  Google Scholar 

  30. K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications (Wiley, New York, 2001)

    Google Scholar 

  31. K. Kant, A Global Checkpointing Model for Error Recovery. In AFIPS’83: Proceedings of the National Computer Conference, Anaheim, CA, 16–19 May, 1983 (ACM Press, New York, 1983), pp. 81–89

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katinka Wolter .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wolter, K. (2010). Stochastic Models for Checkpointing. In: Stochastic Models for Fault Tolerance. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11257-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11257-7_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11256-0

  • Online ISBN: 978-3-642-11257-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics