Skip to main content

On the Checkpointing Strategy in Desktop Grids

  • Conference paper
Internet and Distributed Computing Systems (IDCS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7646))

Included in the following conference series:

  • 725 Accesses

Abstract

Checkpointing is an effective measure to ensure the completion of long-running jobs in Desktop Grids which are subject to frequent resource failures. We focus on checkpointing strategies in the context of Desktop Grids, including volunteer computing systems, where individual hosts follow diverse failure distributions. We propose an algorithm which computes sequence of checkpoint interval lengths for each individual host according to a sample of its availability interval lengths. This algorithm directly approximates the probability distribution of availability interval lengths with the sample, without deriving a closed form of the probability distribution. Through simulations with synthetic trace data and trace data from real volunteer computing project, this sample based strategy shows better performance than periodic strategy in terms of wasted time in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nurmi, D., Brevik, J., Wolski, R.: Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 432–441. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Wolski, R., Nurmi, D., Brevik, J.: An Analysis of Availability Distributions in Condor. In: IPDPS 2007: Proceedings of the 21th International Parallel and Distributed Processing Symposium, pp. 1–6. IEEE (2007)

    Google Scholar 

  3. Javadi, B., Kondo, D., Vincent, J.-M., Anderson, D.P.: Mining for Statistical Availability Models in Large-Scale Distributed Systems: An Empirical Study of SETI@home. In: MASCOTS 2009: Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 1–10 (2009)

    Google Scholar 

  4. Young, J.W.: A First Order Approximation to the Optimal Checkpoint Interval. Commun. ACM 17(9), 530–531 (1974)

    Article  MATH  Google Scholar 

  5. Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Comp. Syst. 22(3), 303–312 (2006)

    Article  Google Scholar 

  6. Chandy, K.M.: A Survey of Analytic Models of Rollback and Recovery Strategies. Computer 8(5), 40–47 (1975)

    Article  Google Scholar 

  7. Chandy, K.M., Browne, J.C., Dissly, C.W., Uhrig, W.R.: Analytic models for rollback and recovery strategies in database systems. IEEE Trans. Software Eng. SE-1, 100–110 (1975)

    Article  Google Scholar 

  8. Gelenbe, E.: A model of rollback recovery with multiple checkpoints. In: Proceedings of the Second International Symposium on Software Engineering, pp. 251–255. ACM (1976)

    Google Scholar 

  9. Gelenbe, E., Derochette, D.: Performance of rollback recovery systems under intermittent failures. Commun. ACM 21(6), 493–499 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gelenbe, E.: On the optimum checkpoint interval. J. ACM 26(2), 259–270 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  11. Tantawi, A.N., Ruschitzka, M.: Performance Analysis of Checkpointing Strategies. ACM Trans. Comput. Syst. 2(2), 123–144 (1984)

    Article  Google Scholar 

  12. L’Ecuyer, P., Malenfant, J.: Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems. IEEE Trans. Computers 37(4), 491–496 (1988)

    Article  Google Scholar 

  13. Ling, Y., Mi, J., Lin, X.: A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Trans. Computers 50(7), 699–708 (2001)

    Article  Google Scholar 

  14. Krishna, C.M., Shin, K.G., Lee, Y.-H.: Optimization Criteria for Checkpoint Placements. Comm. ACM 27(4), 1008–1012 (1984)

    Article  Google Scholar 

  15. Bouguerra, M.-S., Kondo, D., Trystram, D.: On the Scheduling of Checkpoints in Desktop Grids. In: Proceedings of 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2011), pp. 305–313. IEEE (2011)

    Google Scholar 

  16. Vaidya, N.H.: Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Transactions on Computers 46(8), 942–947 (1997)

    Article  Google Scholar 

  17. Ziv, A., Bruck, J.: Performance Optimization of Checkpointing Schemes with Task Duplication. IEEE Transactions on Computers 46(12), 1381–1386 (1997)

    Article  MathSciNet  Google Scholar 

  18. Ziv, A., Bruck, J.: An On-Line Algorithm for Checkpoint Placement. IEEE Transactions on Computers 46(9), 976–985 (1997)

    Article  MathSciNet  Google Scholar 

  19. Javadi, B., Kondo, D., Vincent, J.-M., Anderson, D.P.: Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home. IEEE Trans. Parallel Distrib. Syst. 22(11), 1896–1903 (2011)

    Article  Google Scholar 

  20. Kondo, D., Javadi, B., Iosup, A., Epema, D.H.J.: The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems. In: CCGRID 2010: Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 398–407. IEEE (2010)

    Google Scholar 

  21. Anderson, D.P.: BOINC: a system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing, pp. 4–10 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, D., Gong, B. (2012). On the Checkpointing Strategy in Desktop Grids. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34883-9_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34882-2

  • Online ISBN: 978-3-642-34883-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics