Dynamic replication factor model for Linux containers-based cloud systems

  • Heithem Abbes
  • Thouraya LouatiEmail author
  • Christophe Cérin


Infrastructure-as-a-service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of cloud architectures, faults are becoming a frequent occurrence, which makes availability true challenge. Replication is a method to survive failures whether of checkpoints, containers or data to increase their availability. In fact, following a node failure, fault-tolerant cloud systems restart failed containers on a new node from distributed images of containers (or checkpoints). With a high failure rate, we can lose some replicas. It is interesting to increase the replication factor in some cases and finding the trade-off between restarting all failed containers and storage overhead. This paper addresses the issue of adapting the replication factor and contributes with a novel replication factor modeling approach, which is able to predict the right replication factor using prediction techniques. These techniques are based on experimental modeling, which analyze collected data related to different executions. We have used regression technique to find the relation between availability and replicas number. Experiments on the Grid’5000 testbed demonstrate the benefits of our proposal to satisfy the availability requirement, using a real fault-tolerant cloud system.


Cloud computing Containers Fault tolerance Replication Modeling Regression Prediction Grid’5000 



  1. 1.
    Marinescu DC (2017) Cloud computing: theory and practice. Morgan Kaufmann, BurlingtonGoogle Scholar
  2. 2.
    Mell P, Grance T (2011) The NIST definition of cloud computing. National Institute of Standards & Technology, Gaithersburg, MD, USACrossRefGoogle Scholar
  3. 3.
    Joy AM (2015) Performance comparison between linux containers and virtual machines. In: 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA), IEEE, pp 342–346Google Scholar
  4. 4.
    Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Hum Centric Comput Inf Sci 8(1):1CrossRefGoogle Scholar
  5. 5.
    Vishwanath KV, Nagappan N (2010) Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp 193–204Google Scholar
  6. 6.
    Jhawar R, Piuri V (2017) Fault tolerance and resilience in cloud computing environments. In: Computer and information security handbook (3rd edn), Elsevier, pp 165–181Google Scholar
  7. 7.
    Cheraghlou MN, Khadem-Zadeh A, Haghparast M (2016) A survey of fault tolerance architecture in cloud computing. J Netw Comput Appl 61:81–92CrossRefGoogle Scholar
  8. 8.
    Milani BA, Navimipour NJ (2016) A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J Netw Comput Appl 64:229–238CrossRefGoogle Scholar
  9. 9.
    Louati T, Abbes H, Cérin C, Jemni M (2018) Lxcloud-cr: towards linux containers distributed hash table based checkpoint-restart. J Parallel Distrib Comput 111:187–205CrossRefGoogle Scholar
  10. 10.
    Louati T, Abbes H, Cérin C (2018) Lxcloudft: towards high availability, fault tolerant cloud system based linux containers. J Parallel Distrib Comput 122:51–69CrossRefGoogle Scholar
  11. 11.
    Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. J Electr Comput Eng 2015:143071:1–143071:9Google Scholar
  12. 12.
    Hightower K, Burns B, Beda J (2017) Kubernetes: up and running dive into the future of infrastructure, 1st edition. O’Reilly Media, Inc, ISBN: 1491935677Google Scholar
  13. 13.
    Netto HV, Lung LC, Correia M, Luiz AF, de Souza LMS (2017) State machine replication in containers managed by kubernetes. J Syst Archit 73:53–59CrossRefGoogle Scholar
  14. 14.
    OpenStack (2019)
  15. 15.
    Docker kubernetes (2019) kubernetes
  16. 16.
  17. 17.
    Hassan WU, Lemay M, Aguse N, Bates A, Moyer T (2018) Towards scalable cluster auditing through grammatical inference over provenance graphs. In: Network and Distributed Systems Security SymposiumGoogle Scholar
  18. 18.
    Autonomic aspects in cloud data management (2018)
  19. 19.
    Apache Hadoop (2019)
  20. 20.
    Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp 188–196Google Scholar
  21. 21.
    Wang M, Li B, Zhao Y, Pu G (2014) Formalizing google file system. In: 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing (PRDC), IEEE, pp 190–191Google Scholar
  22. 22.
    The Google File System (GFS) (2019)
  23. 23.
  24. 24.
    Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2017) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. CrossRefGoogle Scholar
  25. 25.
  26. 26.
    CRIU Comparison to other CR projects (2019)
  27. 27.
    CRIU Images (2019).
  28. 28.
    Louati T, Abbes H, Cérin C, Jemni M (2017) Gc-cr: a decentralized garbage collector component for checkpointing in clouds. In: 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, pp 97–104Google Scholar
  29. 29.
    Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. JECE 2015:1:1–1:1Google Scholar
  30. 30.
    Rajamani K, Sheela D (2018) Data mining techniques and algorithms in cloud environment-a review. Int J Pure Appl Math 119:599–602Google Scholar
  31. 31.
  32. 32.
    Grid’5000 (2019)
  33. 33.
  34. 34.
    Prezi (2019)
  35. 35.
    Prezi DataSet (2019)
  36. 36.
    LXC linux container (2019)
  37. 37.
    Alapati SR (2018) Cassandra on Docker, Apache Spark, and the Cassandra Cluster Manager. In: Expert Apache Cassandra Administration, Springer, pp 249–281Google Scholar
  38. 38.
  39. 39.
  40. 40.
  41. 41.
    Yassir S, Mostapha Z, Najlae K (2018) The impact of checkpointing interval selection on the scheduling performance of Hadoop framework. In: 2018 6th International Conference on Multimedia Computing and Systems (ICMCS), IEEE, pp 1–6Google Scholar
  42. 42.
    CRIU Logging (2019)
  43. 43.
    CRIU Better Logging (2019)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Heithem Abbes
    • 1
  • Thouraya Louati
    • 1
    Email author
  • Christophe Cérin
    • 2
  1. 1.LaTICE Research Lab., ENSITUniversity of TunisTunisTunisia
  2. 2.LIPN, UMR CNRS 7030Université de Paris 13VilletaneuseFrance

Personalised recommendations