Skip to main content

Review of Some Checkpointing Algorithms for Distributed and Mobile Systems

  • Conference paper
  • 2327 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 196))

Abstract

A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. A mobile computing system is a distributed system where some of processes are running on mobile hosts (MHs). Checkpoint is defined as a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. Checkpointing is the process of saving the status information. Over the past two decades, intensive research work has been carried out on providing efficient checkpointing protocols in traditional distributed computing. The existence of mobile nodes in a distributed system introduces new issues that need proper handling while designing a checkpointing algorithm for such systems. These issues are mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage etc. Recently, more attention has been paid to providing checkpointing protocols for mobile systems. This paper surveys the algorithms which have been reported in the literature for checkpointing in distributed systems as well as Mobile Distributed systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acharya, A., Badrinath, B.R.: Checkpointing Distributed Applications on Mobile Computers. In: Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, pp. 73–80 (September 1994)

    Google Scholar 

  2. Awasthi, L.K., Kumar, P.: A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach. International Journal of Information and Computer Security 1(3), 298–314 (2007)

    Article  Google Scholar 

  3. Gupta, B., Rahimi, S., Liu, Z.: A new High Performance Checkpointing Approach for Mobile computing Systems. IJCSNS International Journal of Computer Science and Network Security 6(5B), 95–104 (2006)

    Google Scholar 

  4. Lin, C.M., Dow, C.-R.: Efficient Checkpoint-based Failure Recovery Techniques in Mobile Computing Systems. Journal of Information Science And Engineering 17, 549–573 (2001)

    Google Scholar 

  5. Subba Rao, C.D.V., Naidu, M.M.: A New, Efficient Coordinated Checkpointing Protocol Combined with Selective Sender-Based Message Logging. In: International Conference on Computer Systems and Applications, March 31-April 4, pp. 444–447. IEEE, Los Alamitos (2008)

    Google Scholar 

  6. Men, C., Xu, Z.: Performance Analysis of Rollback Recovery Schemes for the Mobile Computing Environment. In: International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, December 10-12, pp. 371–378 (2008)

    Google Scholar 

  7. Mani, C.K., Lamport, L.: Distributed Snapshots: Determining Global States of distributed systems. ACM transactions on computer systems 3(1), 63–75 (1985)

    Article  Google Scholar 

  8. Briatico, D., Ciuffoletti, A., Simoncini, L.: A distributed domino-effect free recovery algorithm. In: Proceedings of the IEEE International Symposium on Reliability, Distributed Software, and Databases, December, pp. 207–215 (1984)

    Google Scholar 

  9. Elnozahy, E.N., Johnson, D.B., Zwaenepoel, W.: The performance of consistent checkpointing. In: Proceedings of 11th Symposium on Reliable Distributed Systems, pp. 39–47 (1992)

    Google Scholar 

  10. Elnozahy, E.N., Zwaenepoel, W.: On the use and implementation of message logging. In: Digest of Papers: 24th Annual International Symposium on Fault Tolerant Computing, pp. 298–307. IEEE computer society, Los Alamitos (June 1994)

    Chapter  Google Scholar 

  11. Cao, G., Singhal, M.: Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems. IEEE Transactions On Parallel And Distributed Systems 12(2), 157–172 (2001)

    Article  Google Scholar 

  12. Cao, G., Singhal, M.: On the impossibility of Min-Process Non-Blocking Checkpointing and an efficient Checkpointing Algorithm for mobile computing system. In: Proceedings of International Conference on Parallel Processing, August 10-14, pp. 37–44 (1998)

    Google Scholar 

  13. Li, G., Shu, L.: A low Latency Checkpointing Scheme for mobile computing system. In: Proceedings of the 29th Annual International Computer Software and Application Conference (COMPSAC 2005), pp. 491–496 (2005)

    Google Scholar 

  14. Higaki, H., Takizawa, M.: Checkpoint- Recovery Protocol for Reliable Mobile Systems. In: Proceedings of the 17th Symposium on Reliable Distributed Systems, pp. 93–99 (October 1998)

    Google Scholar 

  15. Helary, J.-M.: Observing global states of asynchronous distributed applications. In: Bermond, J.-C., Raynal, M. (eds.) WDAG 1989. LNCS, vol. 392, pp. 124–134. Springer, Heidelberg (1989)

    Chapter  Google Scholar 

  16. Kim, J.L., Park, T.: An efficient Protocol for checkpointing Recovery in Distributed Systems. IEEE Trans. Parallel and Distributed Systems, 955–960 (August 1993)

    Google Scholar 

  17. Qiangfeng, J., Mannivannan, D.: An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems. In: IEEE International on Parallel and Distributed Processing Symposium, IPDPS 2007, March 26-30, pp. 1–10 (2007)

    Google Scholar 

  18. Juang, Venkatesan, S.: Crash recovery with little overhead. In: Proceedings of the 11th International Conference on Distributed Computer Systems, pp. 454–461 (1991)

    Google Scholar 

  19. Jefferson, D.R.: Virtual Time. ACM Transactions on Programming Languages and Systems 7(3), 404–425 (1985)

    Article  MathSciNet  Google Scholar 

  20. Johnson, D.B., Zwaenepoel, W.: Sender-based message logging. In: Proceedingss of 17th international Symposium on Fault-Tolerant Computing, pp. 14–19 (1987)

    Google Scholar 

  21. Johnson, D.B., Zwaenepoel, W.: Recovery in Distributed Systems using optimistic message logging and checkpointing. Journal of Algorithms 11(2), 462–491 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Plank, J.S.: Effect of checkpointing on MIMD architecture, Ph.D. thesis, department of computer science, Princeton University (1993)

    Google Scholar 

  23. Kumar, P.: A low cost hybrid coordinated checkpointing protocol for mobile distributed systems. Journal of Mobile Information System 4, 13–32 (2008)

    Article  Google Scholar 

  24. Kumar, L., Mishra, M., Joshi, R.C.: Checkpointing in distributed computing systems. In: Concurrency in Dependable Computing, pp. 273–292 (2002)

    Google Scholar 

  25. Lai, T.H., Yang, T.H.: On distributed snapshots. Information Processing Letters 25, 153–158 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  26. Li, H.F., Radhakrishnan, T., Venkatesh, K.: Global state detection in non-FIFO networks. In: Proceedings of the 7th International Conference on Distributed Computing Systems, pp. 364–370 (1987)

    Google Scholar 

  27. Alvisi, L., Hoppe, B., Marzullo, K.: Nonblocking and orphan free message logging protocols. In: The Proceedings of 23rd Fault Tolerant Compting Symposium, pp. 145–154 (June 1993)

    Google Scholar 

  28. Chandy, M., Lamport, L.: Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)

    Article  Google Scholar 

  29. Manabe, Y.: A distributed consistent global checkpoint algorithm for distributed mobile systems. In: 8th International Conference on Parallel and Distributed Systems (ICPADS 2001), Korea, June 26-29 (2001)

    Google Scholar 

  30. Mandal, P.S., Mukhopadhyaya, K.: Checkpointing using Mobile Agents in Distributed Systems. In: Proceedings of the International Conference on Computing: Theory and Applications, (ICCTA 2007) (2007)

    Google Scholar 

  31. Manivannan, D., Singhal, M.: A low overhead recovery technique using quasi synchronous checkpointing. In: Proceedings of the 16th International Conference on Distributed Computing Systems, pp. 100–107 (1996)

    Google Scholar 

  32. Mattern, F.: Efficient algorithms for distributed snapshots and global virtual time approximation. Journal of Parallel and Distributed Computing 18, 423–434 (1993)

    Article  Google Scholar 

  33. Neves, N., Fuchs, W.K.: Adaptive Recovery for Mobile Environments. In: Proceedings of the IEEE High-Assurance Systems Engineering Workshop (October 1996)

    Google Scholar 

  34. Netzer, R.H.B., Xu, J.: Necessary and sufficient conditions for consistent global snapshots. IEEE Transactions on Parallel and Distributed Systems 6(2), 165–169 (1995)

    Article  Google Scholar 

  35. Prakash, R., Singhal, M.: Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems. IEEE Transaction on Parallel and Distributed Systems 7(10), 1035–1048 (1996)

    Article  Google Scholar 

  36. Kumar, P., Lumar, L., Chauhan, R.K.: A Non-Intrusive minimum process synchronous checkpointing protocol for mobile distributed systems. In: Proceedings of IEEE ICPWC-2005 (2005)

    Google Scholar 

  37. Koo, R., Toueg, S.: Checkpointing and rollback recovery for distributed systems. IEEE transactions on software engineering SE-13(1), 23–31 (1987)

    Article  MATH  Google Scholar 

  38. Neogy, S., Sinha, A., Das, P.K.: CCUML: a check pointing protocol for distributed system processes. In: TENCON 2004, Thailand, vol. B(2), pp. 553–556 (November 2004)

    Google Scholar 

  39. Basu, S., Palchaudhuri, S., Podder, S., Chakrabarty, M.: A Checkpointing and Recovery Algorithm Based on Location Distance, Handoff and Stationary Checkpoints for Mobile Computing Systems, In: International Conference on Advances in Recent Technologies in Communication and Computing 2009, October 27-28, pp. 58-62 (2009)

    Google Scholar 

  40. Silva, L.M., Silva, J.G.: Global checkpointing for distributed programs. In: Proc. 11th symp. Reliable Distributed Systems, pp. 155–162 (October 1992)

    Google Scholar 

  41. Kumar, P., Garg, R.: Soft Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems. International Journal of Distributed Systems and Technologies 2(1), 1–13 (2011)

    Article  MathSciNet  Google Scholar 

  42. Ni, W., Vrbsky, S.V., Ray, S.: Low Cost Coordinated Nonblocking checkpointing in Mobile Computing Systems. In: Proceedings of the 8th IEEE International Symposium on Computers and Communication (ISCC 2003), pp. 62–69 (2003)

    Google Scholar 

  43. Wang, Y.M., Fuchs, W.K.: Lazy checkpoint coordination for bounding rollback propagation. In: Proceedings of IEEE Symposium on Reliable Distributed Systems, pp. 78–85 (1993)

    Google Scholar 

  44. Gupta, S.K., Chauhan, R.K., Kumar, P.: A Minimum-process Coordinated Checkpointing Protocol for Mobile Computing Systems. International Journal of Foundations of Computer Science 19(4), 1015–1038 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  45. Singhal, M., Shivaratri, N.: Advanced Concepts in Operating Systems. McGraw Hill, New York (1994)

    Google Scholar 

  46. Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, S.K., Kumar, P. (2011). Review of Some Checkpointing Algorithms for Distributed and Mobile Systems. In: Wyld, D.C., Wozniak, M., Chaki, N., Meghanathan, N., Nagamalai, D. (eds) Advances in Network Security and Applications. CNSA 2011. Communications in Computer and Information Science, vol 196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22540-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22540-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22539-0

  • Online ISBN: 978-3-642-22540-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics