Skip to main content

Recovering from Cloud Application Deployment Failures Through Re-execution

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10230))

Abstract

In this paper we study the problem of automated cloud application deployment and configuration. Transient failures are commonly found in current cloud infrastructures, attributed to the complexity of the software and hardware stacks utilized. These errors affect cloud application deployment, forcing the users to manually check and intervene in the deployment process. To address this challenge, we propose a simple yet powerful deployment methodology with error recovery features that bases its functionality on identifying the script dependencies and re-executing the appropriate configuration scripts. To guarantee the idempotent script execution, we adopt a filesystem snapshot mechanism that enables our approach to revert to a healthy filesystem state in case of failed script executions. Our experimental analysis indicates that our approach can resolve any transient deployment failure appearing during the deployment phase, even in highly unpredictable cloud environments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that message transmission might not be instant (as implied by the Figure) since consumption of a specific message might occur much later than the message post, but the arrows are depicted perpendicular to the time axis for simplicity.

References

  1. Ansible. http://www.ansible.com/home

  2. AUFS. http://aufs.sourceforge.net/

  3. AWS CloudFormation. http://aws.amazon.com/cloudformation/

  4. AWS Elastic BeanStalk. http://aws.amazon.com/elasticbeanstalk/

  5. AWS Incident. https://goo.gl/f959fl

  6. AWS Instances Boot Times. http://goo.gl/NQ1qNw

  7. AWS Maintenance. https://aws.amazon.com/maintenance-help/

  8. Chef. https://www.chef.io/chef/

  9. Docker Container. https://www.docker.com/

  10. Docker: Select a storage driver. https://goo.gl/o383To

  11. Google App Engine Incident. https://goo.gl/ICI0Mo

  12. Juju. https://juju.ubuntu.com/

  13. Openstack Heat. https://wiki.openstack.org/wiki/Heat

  14. Openstack Sahara. https://wiki.openstack.org/wiki/Sahara

  15. Overlay Filesystem. https://goo.gl/y0H76w

  16. Puppet. http://puppetlabs.com/

  17. Rackspace SLAs. https://www.rackspace.com/information/legal/cloud/sla

  18. Vagrant. https://www.vagrantup.com/

  19. VMware vCloud Automation Center Documentation Center. http://goo.gl/YkKNic

  20. Jennings, B., Stadler, R.: Resource management in clouds: survey and research challenges. J. Netw. Syst. Manag. 23(3), 567–619 (2015)

    Article  Google Scholar 

  21. Juve, G., Deelman, E.: Automating application deployment in infrastructure clouds. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 658–665. IEEE (2011)

    Google Scholar 

  22. Katsuno, Y., Takahashi, H.: An automated parallel approach for rapid deployment of composite application servers. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), pp. 126–134. IEEE (2015)

    Google Scholar 

  23. Liu, C., Mao, Y., Van der Merwe, J., Fernandez, M.: Cloud resource orchestration: a data-centric approach. In: Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR). pp. 1–8 (2011)

    Google Scholar 

  24. Lu, Q., Zhu, L., Xu, X., Bass, L., Li, S., Zhang, W., Wang, N.: Mechanisms and architectures for tail-tolerant system operations in cloud. In: 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14) (2014)

    Google Scholar 

  25. Mell, P., Grance, T.: The NIST Definition of Cloud Computing (2011)

    Google Scholar 

  26. Potharaju, R., Jain, N.: When the network crumbles: an empirical study of cloud network failures and their impact on services. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 15. ACM (2013)

    Google Scholar 

  27. Rodeh, O., Bacik, J., Mason, C.: Btrfs: the linux b-tree filesystem. ACM Trans. Storage (TOS) 9(3), 9 (2013)

    Google Scholar 

  28. Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, elastic resource provisioning for nosql clusters using tiramola. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 34–41. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Giannakopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Giannakopoulos, I., Konstantinou, I., Tsoumakos, D., Koziris, N. (2017). Recovering from Cloud Application Deployment Failures Through Re-execution. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57045-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57044-0

  • Online ISBN: 978-3-319-57045-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics