The Journal of Supercomputing

, Volume 33, Issue 1–2, pp 65–78 | Cite as

A new approach for high performance computing systems with various checkpointing schemes

  • Gyung-Leen Park
  • Hee Yong Youn
The Journal of Supercomputing Special Issue on Modeling and Simulation in Supercomputing and Telecommunications Contact Information


Roll-forward recovery schemes were proposed to enhance the performance of fault tolerant systems employing checkpointing approach. In the roll-forward schemes, multiple processors are used for simultaneous roll-forward and validation processing. This paper proposes thesample comparison approach along with the checkpointing, which further improves the performance by reducing the overhead imposed by the checkpointing. We also develop general analytical models for estimating the availability, which are applicable for any checkpointing scheme. Performance comparisons reveal that the availabilities of the checkpointing schemes with sample comparison are higher than those of the schemes without it, while the required checkpoint interval is larger.


availability checkpointing fault-tolerant rollback roll-forward 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Agbaria, A. Freund, and R. Friedman. Evaluating distributed checkpointing protocols.23rd Intl. Conf. Dist. Comput. Syst., May 2003, pp. 266–273.Google Scholar
  2. 2.
    L. Alvisi, E. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel. An analysis of communication induced check-pointing.29th Fault-Tolerance Comput. Symp., June 1999, pp. 242–249.Google Scholar
  3. 3.
    R. Baldoni, J. M. Helary, and M. Raynal. Rollback-dependency trackability: A minimal characterization and its protocol.Inform, and Comput., 2001.Google Scholar
  4. 4.
    G. Gao and M. Singhal. Mutable checkpoints: A new checkpointing approach for mobile computing systems.IEEE Trans. Parallel Dist. Syst., 12(2):157–172, 2001.CrossRefGoogle Scholar
  5. 5.
    J. M. Helary, A. Mostefaoui, R. H. B. Netzer, and M. Raynal. Communication-based prevention of useless checkpoints in distributed computations.Distributed Comput., 13:29–43, 2000.CrossRefGoogle Scholar
  6. 6.
    B. Lee, T. Park, and H. Y. Yeom. On the impossibility of non-blocking consistent casual recovery.IEICE Trims. Inform. Syst. E83-D, (2):29l-294, 2000.Google Scholar
  7. 7.
    J. Long, W. K. Fuchs, and J. A. Abraham. Compiler-assisted static checkpoint insertion.22nd Intl. Symp. Fault-Tolerant Computing, July 1992, pp. 58–65.Google Scholar
  8. 8.
    J. Long, W. K. Fuchs, and J. A. Abraham. Implementing forward recovery using checkpoints in distributed systems.IFIP Work. Conf. Dependable Comput. for Critical Appl., 1992, pp. 27–36.Google Scholar
  9. 9.
    D. Manivannan and M. Singhal. Quazi-synchronous checkpoint: Models, characterization, and classification.IEEE Trans. Parallel and Distributed Systems, 1O(7):7O3–7I3, 1999.Google Scholar
  10. 10.
    T. Park and H. Y. Yeom. An asychronous recovery scheme based on optimistic message logging for mobile computing systems.20th Intl. Conf. Dist. Comput. Syst., April 2000. pp. 436–443.Google Scholar
  11. 11.
    G.-L. Park, H. Y. Youn, and H.-S. Choo. Optimal checkpoint interval analysis using stochastic petri net.IEEE Intl. Symp. Dependable Computing, Dec. 2001, pp. 57–60.Google Scholar
  12. 12.
    D. K. Pradhan and N. H. Vaidya. Roll-forward checkpointing scheme: A novel fault tolerant architecture.IEEE Trans. Computers, 43(10):l163–1174, 1994.CrossRefGoogle Scholar
  13. 13.
    S. Rao, L. Alvisi, and H. M. Vin The cost of recovery in message logging protocols.IEEE Trans. Knowledge Data Eng., 12(2):160–173, 2000.CrossRefGoogle Scholar
  14. 14.
    J. Tsai, S. Y. Kuo, and Y. M. Wang. Evaluation on dominio-free communication-induced checkpointing protocols.Inform. Process. Lett., 69(l):3l-37, 1999.MathSciNetGoogle Scholar
  15. 15.
    B. Yao, K.-F. Ssu, and W. K. Fuchs.Message logging in mobile computing. 29th Intl. Symp. on Fault-Tolerant Computing, 1999, pp. 14–19.Google Scholar

Copyright information

© Springer Science + Business Media, Inc 2005

Authors and Affiliations

  1. 1.Department of Computer Science and StatisticsCheju National UniversityChejuKorea
  2. 2.School of Information and Communications EngineeringSungkyunkwan UniversitySuwonKorea

Personalised recommendations