Skip to main content

A Framework for Automated Fault Recovery Planning in Large-Scale Virtualized Infrastructures

  • Conference paper
Modelling Autonomic Communication Environments (MACE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 6473))

Abstract

Multi-layered provisioning architectures such as those in emergent virtualized (e.g. cloud) infrastructures exacerbate the cost of faults to a degree where automation effectively constitutes a prerequisite for operations. The acquisition of management information and the execution of routine tasks have been automated to some degree; however the decision processes behind fault management in large-scale environments have not. This paper addresses automation of such decision processes by proposing a planning-based fault recovery algorithm based on hierarchical task networks and data models for the knowledge necessary to the recovery process. We embed these concepts in a generic architecture and evaluate its prototypical implementation with respect to function and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Willams, A.: Top 5 Cloud Outages of the Past Two Years. ReadWrite Cloud (2010)

    Google Scholar 

  2. Andrzejak, A., Reinefeld, A., Schintke, F., Schuett, T.: On adaptability in grid systems. Future Generation Grids, 29–46 (2006)

    Google Scholar 

  3. Arshad, N.: A Planning-Based Approach to Failure Recovery in Distributed Systems. PhD thesis, University of Colorado at Boulder (2006)

    Google Scholar 

  4. Blythe, J., Deelman, E., Gil, Y., Kesselman, C.: Transparent grid computing: a knowledge-based approach. In: 15th Innovative Applications of Artificial Intelligence Conference (2003)

    Google Scholar 

  5. Deelman, E., Blythe, J., Gil, Y., Kesselman, K.V.C., Mehta, G.: Mapping abstract complex workflows onto grid environments. Journal of Grid Computing 1(1) (March 2003)

    Google Scholar 

  6. Erol, K., Hendler, J., Nau, D.S.: Umcp: A sound and complete procedure for hierarchical task-network planning. In: Proceedings of the 2nd International Conference on Artificial Intelligence Planning Systems (AIPS 1994), pp. 249–254 (1994)

    Google Scholar 

  7. Barrett, A., et al.: Partial-order planning: Evaluating possible efficiency gains. Artificial Intelligence 67, 71–112 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fishburn, P.C.: Utility theory for decision making. Storming Media (1970)

    Google Scholar 

  9. Fox, M., Long, D.: International planning competition (2002)

    Google Scholar 

  10. Goldberg, D.E., et al.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  11. Dignan, L.: Amazon’s S3 Outage: Is the cloud too complicated? ZDNet (July 2008)

    Google Scholar 

  12. Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)

    MATH  Google Scholar 

  13. Nau, D.S.: Current trends in automated planning. AI Magazine 28(4), 43 (2007)

    Google Scholar 

  14. Robertson, P., Williams, B.: Automatic recovery from software failure. Communications of the ACM 49(3), 47 (2006)

    Article  Google Scholar 

  15. Gopisetty, S., et al.: Automated Planner for Storage Provisioning and Disaster Recovery. IBM Journal of Research and Development 52(4/5) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, F., Danciu, V.A., Kerestey, P. (2010). A Framework for Automated Fault Recovery Planning in Large-Scale Virtualized Infrastructures. In: Brennan, R., Fleck, J., van der Meer, S. (eds) Modelling Autonomic Communication Environments. MACE 2010. Lecture Notes in Computer Science, vol 6473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16836-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16836-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16835-2

  • Online ISBN: 978-3-642-16836-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics