Dynamic replication factor model for Linux containers-based cloud systems

Abstract

Infrastructure-as-a-service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of cloud architectures, faults are becoming a frequent occurrence, which makes availability true challenge. Replication is a method to survive failures whether of checkpoints, containers or data to increase their availability. In fact, following a node failure, fault-tolerant cloud systems restart failed containers on a new node from distributed images of containers (or checkpoints). With a high failure rate, we can lose some replicas. It is interesting to increase the replication factor in some cases and finding the trade-off between restarting all failed containers and storage overhead. This paper addresses the issue of adapting the replication factor and contributes with a novel replication factor modeling approach, which is able to predict the right replication factor using prediction techniques. These techniques are based on experimental modeling, which analyze collected data related to different executions. We have used regression technique to find the relation between availability and replicas number. Experiments on the Grid’5000 testbed demonstrate the benefits of our proposal to satisfy the availability requirement, using a real fault-tolerant cloud system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    Marinescu DC (2017) Cloud computing: theory and practice. Morgan Kaufmann, Burlington

    Google Scholar 

  2. 2.

    Mell P, Grance T (2011) The NIST definition of cloud computing. National Institute of Standards & Technology, Gaithersburg, MD, USA

    Book  Google Scholar 

  3. 3.

    Joy AM (2015) Performance comparison between linux containers and virtual machines. In: 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA), IEEE, pp 342–346

  4. 4.

    Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Hum Centric Comput Inf Sci 8(1):1

    Article  Google Scholar 

  5. 5.

    Vishwanath KV, Nagappan N (2010) Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp 193–204

  6. 6.

    Jhawar R, Piuri V (2017) Fault tolerance and resilience in cloud computing environments. In: Computer and information security handbook (3rd edn), Elsevier, pp 165–181

  7. 7.

    Cheraghlou MN, Khadem-Zadeh A, Haghparast M (2016) A survey of fault tolerance architecture in cloud computing. J Netw Comput Appl 61:81–92

    Article  Google Scholar 

  8. 8.

    Milani BA, Navimipour NJ (2016) A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J Netw Comput Appl 64:229–238

    Article  Google Scholar 

  9. 9.

    Louati T, Abbes H, Cérin C, Jemni M (2018) Lxcloud-cr: towards linux containers distributed hash table based checkpoint-restart. J Parallel Distrib Comput 111:187–205

    Article  Google Scholar 

  10. 10.

    Louati T, Abbes H, Cérin C (2018) Lxcloudft: towards high availability, fault tolerant cloud system based linux containers. J Parallel Distrib Comput 122:51–69

    Article  Google Scholar 

  11. 11.

    Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. J Electr Comput Eng 2015:143071:1–143071:9

    Google Scholar 

  12. 12.

    Hightower K, Burns B, Beda J (2017) Kubernetes: up and running dive into the future of infrastructure, 1st edition. O’Reilly Media, Inc, ISBN: 1491935677

  13. 13.

    Netto HV, Lung LC, Correia M, Luiz AF, de Souza LMS (2017) State machine replication in containers managed by kubernetes. J Syst Archit 73:53–59

    Article  Google Scholar 

  14. 14.

    OpenStack (2019) https://www.openstack.org/

  15. 15.

    Docker kubernetes (2019) https://www.docker.com/ kubernetes

  16. 16.

    Docker Swarm (2019) https://docs.docker.com/engine/ swarm/

  17. 17.

    Hassan WU, Lemay M, Aguse N, Bates A, Moyer T (2018) Towards scalable cluster auditing through grammatical inference over provenance graphs. In: Network and Distributed Systems Security Symposium

  18. 18.

    Autonomic aspects in cloud data management (2018) http://slideplayer.com/slide/10708882/

  19. 19.

    Apache Hadoop (2019) http://hadoop.org/

  20. 20.

    Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp 188–196

  21. 21.

    Wang M, Li B, Zhao Y, Pu G (2014) Formalizing google file system. In: 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing (PRDC), IEEE, pp 190–191

  22. 22.

    The Google File System (GFS) (2019) https://tinyurl.com/yab4s2zq

  23. 23.

    AmazonS3 Versioning (2019) http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html

  24. 24.

    Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2017) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552

    Article  Google Scholar 

  25. 25.

    CRIU (2019) https://www.criu.org/

  26. 26.

    CRIU Comparison to other CR projects (2019) http://criu.org/Comparison_to_other_CR_projects

  27. 27.

    CRIU Images (2019). https://www.criu.org/

  28. 28.

    Louati T, Abbes H, Cérin C, Jemni M (2017) Gc-cr: a decentralized garbage collector component for checkpointing in clouds. In: 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, pp 97–104

  29. 29.

    Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. JECE 2015:1:1–1:1

    Google Scholar 

  30. 30.

    Rajamani K, Sheela D (2018) Data mining techniques and algorithms in cloud environment-a review. Int J Pure Appl Math 119:599–602

    Google Scholar 

  31. 31.

    r\(_{-}\)squared (2019) http://www.moneychimp.com/glossary/popup/glossary.htm?entry=r_squared

  32. 32.

    Grid’5000 (2019) https://www.grid5000.fr

  33. 33.

    Nancy Site (2019) https://www.grid5000.fr/w/Nancy:Hardware

  34. 34.

    Prezi (2019) https://prezi.com/

  35. 35.

    Prezi DataSet (2019) https://tinyurl.com/sd3e6ac

  36. 36.

    LXC linux container (2019) https://linuxcontainers.org/

  37. 37.

    Alapati SR (2018) Cassandra on Docker, Apache Spark, and the Cassandra Cluster Manager. In: Expert Apache Cassandra Administration, Springer, pp 249–281

  38. 38.

    MLIB (2019) https://spark.apache.org/docs/2.2.0/ml-classification-regression.html

  39. 39.

    SMILE (2019) http://haifengl.github.io/regression.html

  40. 40.

    Scikit-learn (2019) https://scikit-learn.org/stable/supervisedlearning.html

  41. 41.

    Yassir S, Mostapha Z, Najlae K (2018) The impact of checkpointing interval selection on the scheduling performance of Hadoop framework. In: 2018 6th International Conference on Multimedia Computing and Systems (ICMCS), IEEE, pp 1–6

  42. 42.

    CRIU Logging (2019) https://criu.org/Logging

  43. 43.

    CRIU Better Logging (2019) https://criu.org/Betterlogging

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Thouraya Louati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abbes, H., Louati, T. & Cérin, C. Dynamic replication factor model for Linux containers-based cloud systems. J Supercomput 76, 7219–7241 (2020). https://doi.org/10.1007/s11227-020-03158-5

Download citation

Keywords

  • Cloud computing
  • Containers
  • Fault tolerance
  • Replication
  • Modeling
  • Regression
  • Prediction
  • Grid’5000