A High-Performance Adaptive Strategy of Container Checkpoint Based on Pre-replication
During the implementation of the container checkpoint strategy, checkpoint downtime is a pivotal performance indicator. Shorter downtime is especially important for systems that provide critical services. To reduce the checkpoint downtime, an adaptive pre-replication checkpoint strategy named APR-CKPOT is proposed in this paper. Through several rounds of pre-replication, the infrequently modified container memory pages are preferentially copied. The dirty pages generated in the previous round of Pre-Replication are saved in each round of pre-replication. The number of pre-replication checkpoints is adaptively determined by the workload of the user’s operating system in the container. The coordination between fault-tolerance service capabilities and performance of the container can be achieved, and the downtime of the checkpoint can be reduced, which is verified by the given experimental results based on Docker container system.
KeywordsDocker Container Fault-tolerance Pre-replication Checkpoints
This work is supported by the Natural Science Foundation of China (No. 61762008), the Natural Science Foundation Project of Guangxi (No. 2017GXNSFAA198141), the Key R&D project of Guangxi (No. AB17195014).
- 1.James, T.: The Docker Book: containerization is the new virtualization, pp. 10–20 (2014). http://www.dockerbook.com/. Accessed 22 Apr 2015
- 4.Yang, C.T., Liu, J.C., Hsu, C.H., et al.: On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. In: IEEE Third International Conference on Cloud Computing Technology and Science, pp. 122–129. IEEE (2013)Google Scholar
- 5.Lillibridge, M., Kave, E., Deepavali, B.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, pp. 183–197. USENIX Conference (2013)Google Scholar
- 7.LXC-checkpoint [EB/OL]. http://lxc.sourceforge.net/man/lxc-checkpoint.html
- 9.LXC-checkpoint. http://lxc.sourceforge.net/man/lxc-checkpoint.html. Accessed 22 Apr 2015
- 10.Liu, Q., Jung, C., Lee, D., et al.: Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 228–239 (2017)Google Scholar
- 13.Shao, Y., Zhu, X., Bao, W., et al.: CHIME: a checkpoint-based approach to improving the performance of shared clusters. In: International Conference on Parallel and Distributed Systems, pp. 1007–1014. IEEE (2017)Google Scholar
- 15.Piao, G.Y., Oh, Y.G., Sung, B., Park, C.: Efficient pre-replication live migration with memory compaction and adaptive vm downtime control. In: Proceedings of IEEE 4th International Conference on Big Data and Cloud Computing, pp. 85–90. IEEE (2014)Google Scholar
- 16.Louati, T., Abbes, H., Cérin, C., et al.: LXCloud-CR: towards LinuX containers distributed hash table based checkpoint-restart. J. Parallel Distrib. Comput. 12(3), 12–16 (2017)Google Scholar
- 19.Regola, N., Ducom, J.C.: Recommendations for virtualization technologies in high performance computing. In: Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 409–416. IEEE (2010)Google Scholar
- 20.Li, C., Xi, S., Lu, C., et al.: Prioritizing soft real-time network traffic in virtualized hosts based on Xen. In: IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 145–156. IEEE (2015)Google Scholar
- 21.Chi, X., Liu, B., Niu, Q., et al.: Web load balance and cache optimization design based Nginx under high-concurrency environment. In: Third International Conference on Digital Manufacturing and Automation, pp. 1029–1032. IEEE (2012)Google Scholar