Abstract
Multi-layered provisioning architectures such as those in emergent virtualized (e.g. cloud) infrastructures exacerbate the cost of faults to a degree where automation effectively constitutes a prerequisite for operations. The acquisition of management information and the execution of routine tasks have been automated to some degree; however the decision processes behind fault management in large-scale environments have not. This paper addresses automation of such decision processes by proposing a planning-based fault recovery algorithm based on hierarchical task networks and data models for the knowledge necessary to the recovery process. We embed these concepts in a generic architecture and evaluate its prototypical implementation with respect to function and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Willams, A.: Top 5 Cloud Outages of the Past Two Years. ReadWrite Cloud (2010)
Andrzejak, A., Reinefeld, A., Schintke, F., Schuett, T.: On adaptability in grid systems. Future Generation Grids, 29–46 (2006)
Arshad, N.: A Planning-Based Approach to Failure Recovery in Distributed Systems. PhD thesis, University of Colorado at Boulder (2006)
Blythe, J., Deelman, E., Gil, Y., Kesselman, C.: Transparent grid computing: a knowledge-based approach. In: 15th Innovative Applications of Artificial Intelligence Conference (2003)
Deelman, E., Blythe, J., Gil, Y., Kesselman, K.V.C., Mehta, G.: Mapping abstract complex workflows onto grid environments. Journal of Grid Computing 1(1) (March 2003)
Erol, K., Hendler, J., Nau, D.S.: Umcp: A sound and complete procedure for hierarchical task-network planning. In: Proceedings of the 2nd International Conference on Artificial Intelligence Planning Systems (AIPS 1994), pp. 249–254 (1994)
Barrett, A., et al.: Partial-order planning: Evaluating possible efficiency gains. Artificial Intelligence 67, 71–112 (1994)
Fishburn, P.C.: Utility theory for decision making. Storming Media (1970)
Fox, M., Long, D.: International planning competition (2002)
Goldberg, D.E., et al.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading (1989)
Dignan, L.: Amazon’s S3 Outage: Is the cloud too complicated? ZDNet (July 2008)
Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Nau, D.S.: Current trends in automated planning. AI Magazine 28(4), 43 (2007)
Robertson, P., Williams, B.: Automatic recovery from software failure. Communications of the ACM 49(3), 47 (2006)
Gopisetty, S., et al.: Automated Planner for Storage Provisioning and Disaster Recovery. IBM Journal of Research and Development 52(4/5) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, F., Danciu, V.A., Kerestey, P. (2010). A Framework for Automated Fault Recovery Planning in Large-Scale Virtualized Infrastructures. In: Brennan, R., Fleck, J., van der Meer, S. (eds) Modelling Autonomic Communication Environments. MACE 2010. Lecture Notes in Computer Science, vol 6473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16836-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-16836-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16835-2
Online ISBN: 978-3-642-16836-9
eBook Packages: Computer ScienceComputer Science (R0)