A High-Performance Adaptive Strategy of Container Checkpoint Based on Pre-replication

  • Shuo Zhang
  • Ningjiang ChenEmail author
  • Hanlin Zhang
  • Yijun Xue
  • Ruwei Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11342)


During the implementation of the container checkpoint strategy, checkpoint downtime is a pivotal performance indicator. Shorter downtime is especially important for systems that provide critical services. To reduce the checkpoint downtime, an adaptive pre-replication checkpoint strategy named APR-CKPOT is proposed in this paper. Through several rounds of pre-replication, the infrequently modified container memory pages are preferentially copied. The dirty pages generated in the previous round of Pre-Replication are saved in each round of pre-replication. The number of pre-replication checkpoints is adaptively determined by the workload of the user’s operating system in the container. The coordination between fault-tolerance service capabilities and performance of the container can be achieved, and the downtime of the checkpoint can be reduced, which is verified by the given experimental results based on Docker container system.


Docker Container Fault-tolerance Pre-replication Checkpoints 



This work is supported by the Natural Science Foundation of China (No. 61762008), the Natural Science Foundation Project of Guangxi (No. 2017GXNSFAA198141), the Key R&D project of Guangxi (No. AB17195014).


  1. 1.
    James, T.: The Docker Book: containerization is the new virtualization, pp. 10–20 (2014). Accessed 22 Apr 2015
  2. 2.
    Siozios, K., Soudris, D., Hübner, M.: A framework for supporting adaptive fault-tolerant solutions. ACM Trans. Embed. Comput. Syst. 13(5s), 1–22 (2014)CrossRefGoogle Scholar
  3. 3.
    Bernstein, D.: Containers and cloud: from LXC to Docker to Kubernetes. Cloud Comput. 1(3), 81–84 (2015)CrossRefGoogle Scholar
  4. 4.
    Yang, C.T., Liu, J.C., Hsu, C.H., et al.: On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. In: IEEE Third International Conference on Cloud Computing Technology and Science, pp. 122–129. IEEE (2013)Google Scholar
  5. 5.
    Lillibridge, M., Kave, E., Deepavali, B.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, pp. 183–197. USENIX Conference (2013)Google Scholar
  6. 6.
    Pradhan, S., Gokhale, A., Otte, W.R., et al.: Real-time fault tolerant deployment and configuration framework for cyber physical systems. ACM SIGBED Rev. 10(2), 32 (2013)CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Burns, B., Grant, B., Oppenheimer, D., et al.: Borg, Omega, and Kubernetes. Queue 14(1), 10–34 (2016)CrossRefGoogle Scholar
  9. 9.
    LXC-checkpoint. Accessed 22 Apr 2015
  10. 10.
    Liu, Q., Jung, C., Lee, D., et al.: Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 228–239 (2017)Google Scholar
  11. 11.
    Lin, J.C., Leu, F.Y., Chen, Y.P.: Analyzing job completion reliability and job energy consumption for a heterogeneous MapReduce cluster under different intermediate-data replication policies. J. Supercomput. 71(5), 1657–1677 (2015)CrossRefGoogle Scholar
  12. 12.
    Dinh, T., Barkataki, S.: Distributed container: a design pattern for fault tolerance and high-speed data exchange. ACM SIGAda Ada Lett. 29(3), 115–118 (2009)CrossRefGoogle Scholar
  13. 13.
    Shao, Y., Zhu, X., Bao, W., et al.: CHIME: a checkpoint-based approach to improving the performance of shared clusters. In: International Conference on Parallel and Distributed Systems, pp. 1007–1014. IEEE (2017)Google Scholar
  14. 14.
    Xu, F., Liu, F.M., Liu, L.H., Jin, H., Li, B., Li, B.C.: iAware: making live migration of virtual machines interference-aware in the cloud. IEEE Trans. Comput. 63(12), 3012–3025 (2014)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Piao, G.Y., Oh, Y.G., Sung, B., Park, C.: Efficient pre-replication live migration with memory compaction and adaptive vm downtime control. In: Proceedings of IEEE 4th International Conference on Big Data and Cloud Computing, pp. 85–90. IEEE (2014)Google Scholar
  16. 16.
    Louati, T., Abbes, H., Cérin, C., et al.: LXCloud-CR: towards LinuX containers distributed hash table based checkpoint-restart. J. Parallel Distrib. Comput. 12(3), 12–16 (2017)Google Scholar
  17. 17.
    Beloglazov, A., Buyya, R.: OpenStack Neat: a framework for dynamic and energy-efficient consolidation of virtual machines in OpenStack clouds. Concurr. Comput. Pract. Exp. 27(5), 1310–1333 (2015)CrossRefGoogle Scholar
  18. 18.
    Yamato, Y., Katsuragi, S., Nagao, S., et al.: Software maintenance evaluation of agile software development method based on OpenStack. IEICE Trans. Inf. Syst. E98.D(7), 1377–1380 (2015)CrossRefGoogle Scholar
  19. 19.
    Regola, N., Ducom, J.C.: Recommendations for virtualization technologies in high performance computing. In: Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 409–416. IEEE (2010)Google Scholar
  20. 20.
    Li, C., Xi, S., Lu, C., et al.: Prioritizing soft real-time network traffic in virtualized hosts based on Xen. In: IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 145–156. IEEE (2015)Google Scholar
  21. 21.
    Chi, X., Liu, B., Niu, Q., et al.: Web load balance and cache optimization design based Nginx under high-concurrency environment. In: Third International Conference on Digital Manufacturing and Automation, pp. 1029–1032. IEEE (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Shuo Zhang
    • 1
  • Ningjiang Chen
    • 1
    Email author
  • Hanlin Zhang
    • 1
  • Yijun Xue
    • 1
  • Ruwei Huang
    • 1
  1. 1.School of Computer and Electronic InformationGuangxi UniversityNanningChina

Personalised recommendations