On the Checkpointing Strategy in Desktop Grids

Wang, Dongping; Gong, Bin

doi:10.1007/978-3-642-34883-9_17

Dongping Wang¹⁹ &
Bin Gong¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7646))

Included in the following conference series:

International Conference on Internet and Distributed Computing Systems

725 Accesses

Abstract

Checkpointing is an effective measure to ensure the completion of long-running jobs in Desktop Grids which are subject to frequent resource failures. We focus on checkpointing strategies in the context of Desktop Grids, including volunteer computing systems, where individual hosts follow diverse failure distributions. We propose an algorithm which computes sequence of checkpoint interval lengths for each individual host according to a sample of its availability interval lengths. This algorithm directly approximates the probability distribution of availability interval lengths with the sample, without deriving a closed form of the probability distribution. Through simulations with synthetic trace data and trace data from real volunteer computing project, this sample based strategy shows better performance than periodic strategy in terms of wasted time in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nurmi, D., Brevik, J., Wolski, R.: Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 432–441. Springer, Heidelberg (2005)
Chapter Google Scholar
Wolski, R., Nurmi, D., Brevik, J.: An Analysis of Availability Distributions in Condor. In: IPDPS 2007: Proceedings of the 21th International Parallel and Distributed Processing Symposium, pp. 1–6. IEEE (2007)
Google Scholar
Javadi, B., Kondo, D., Vincent, J.-M., Anderson, D.P.: Mining for Statistical Availability Models in Large-Scale Distributed Systems: An Empirical Study of SETI@home. In: MASCOTS 2009: Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 1–10 (2009)
Google Scholar
Young, J.W.: A First Order Approximation to the Optimal Checkpoint Interval. Commun. ACM 17(9), 530–531 (1974)
Article MATH Google Scholar
Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Comp. Syst. 22(3), 303–312 (2006)
Article Google Scholar
Chandy, K.M.: A Survey of Analytic Models of Rollback and Recovery Strategies. Computer 8(5), 40–47 (1975)
Article Google Scholar
Chandy, K.M., Browne, J.C., Dissly, C.W., Uhrig, W.R.: Analytic models for rollback and recovery strategies in database systems. IEEE Trans. Software Eng. SE-1, 100–110 (1975)
Article Google Scholar
Gelenbe, E.: A model of rollback recovery with multiple checkpoints. In: Proceedings of the Second International Symposium on Software Engineering, pp. 251–255. ACM (1976)
Google Scholar
Gelenbe, E., Derochette, D.: Performance of rollback recovery systems under intermittent failures. Commun. ACM 21(6), 493–499 (1978)
Article MathSciNet MATH Google Scholar
Gelenbe, E.: On the optimum checkpoint interval. J. ACM 26(2), 259–270 (1979)
Article MathSciNet MATH Google Scholar
Tantawi, A.N., Ruschitzka, M.: Performance Analysis of Checkpointing Strategies. ACM Trans. Comput. Syst. 2(2), 123–144 (1984)
Article Google Scholar
L’Ecuyer, P., Malenfant, J.: Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems. IEEE Trans. Computers 37(4), 491–496 (1988)
Article Google Scholar
Ling, Y., Mi, J., Lin, X.: A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Trans. Computers 50(7), 699–708 (2001)
Article Google Scholar
Krishna, C.M., Shin, K.G., Lee, Y.-H.: Optimization Criteria for Checkpoint Placements. Comm. ACM 27(4), 1008–1012 (1984)
Article Google Scholar
Bouguerra, M.-S., Kondo, D., Trystram, D.: On the Scheduling of Checkpoints in Desktop Grids. In: Proceedings of 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2011), pp. 305–313. IEEE (2011)
Google Scholar
Vaidya, N.H.: Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Transactions on Computers 46(8), 942–947 (1997)
Article Google Scholar
Ziv, A., Bruck, J.: Performance Optimization of Checkpointing Schemes with Task Duplication. IEEE Transactions on Computers 46(12), 1381–1386 (1997)
Article MathSciNet Google Scholar
Ziv, A., Bruck, J.: An On-Line Algorithm for Checkpoint Placement. IEEE Transactions on Computers 46(9), 976–985 (1997)
Article MathSciNet Google Scholar
Javadi, B., Kondo, D., Vincent, J.-M., Anderson, D.P.: Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home. IEEE Trans. Parallel Distrib. Syst. 22(11), 1896–1903 (2011)
Article Google Scholar
Kondo, D., Javadi, B., Iosup, A., Epema, D.H.J.: The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems. In: CCGRID 2010: Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 398–407. IEEE (2010)
Google Scholar
Anderson, D.P.: BOINC: a system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing, pp. 4–10 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, ShanDong University, Jinan, China
Dongping Wang & Bin Gong

Authors

Dongping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Gong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
Media Distribution, Telstra Corporation Limited, 21/35 Collins St, 3000, Melbourne, VIC, Australia
Mukaddim Pathan
Department of Mathematics and Computing, The University of Southern Queensland, Toowoomba, QLD, Australia
Xiaohui Tao & Hua Wang &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Gong, B. (2012). On the Checkpointing Strategy in Desktop Grids. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-34883-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34882-2
Online ISBN: 978-3-642-34883-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics