A Framework for Automated Fault Recovery Planning in Large-Scale Virtualized Infrastructures
- 321 Downloads
Multi-layered provisioning architectures such as those in emergent virtualized (e.g. cloud) infrastructures exacerbate the cost of faults to a degree where automation effectively constitutes a prerequisite for operations. The acquisition of management information and the execution of routine tasks have been automated to some degree; however the decision processes behind fault management in large-scale environments have not. This paper addresses automation of such decision processes by proposing a planning-based fault recovery algorithm based on hierarchical task networks and data models for the knowledge necessary to the recovery process. We embed these concepts in a generic architecture and evaluate its prototypical implementation with respect to function and scalability.
Keywordsfault management AI planning virtualization cloud computing
Unable to display preview. Download preview PDF.
- 1.Willams, A.: Top 5 Cloud Outages of the Past Two Years. ReadWrite Cloud (2010)Google Scholar
- 2.Andrzejak, A., Reinefeld, A., Schintke, F., Schuett, T.: On adaptability in grid systems. Future Generation Grids, 29–46 (2006)Google Scholar
- 3.Arshad, N.: A Planning-Based Approach to Failure Recovery in Distributed Systems. PhD thesis, University of Colorado at Boulder (2006)Google Scholar
- 4.Blythe, J., Deelman, E., Gil, Y., Kesselman, C.: Transparent grid computing: a knowledge-based approach. In: 15th Innovative Applications of Artificial Intelligence Conference (2003)Google Scholar
- 5.Deelman, E., Blythe, J., Gil, Y., Kesselman, K.V.C., Mehta, G.: Mapping abstract complex workflows onto grid environments. Journal of Grid Computing 1(1) (March 2003)Google Scholar
- 6.Erol, K., Hendler, J., Nau, D.S.: Umcp: A sound and complete procedure for hierarchical task-network planning. In: Proceedings of the 2nd International Conference on Artificial Intelligence Planning Systems (AIPS 1994), pp. 249–254 (1994)Google Scholar
- 8.Fishburn, P.C.: Utility theory for decision making. Storming Media (1970)Google Scholar
- 9.Fox, M., Long, D.: International planning competition (2002)Google Scholar
- 11.Dignan, L.: Amazon’s S3 Outage: Is the cloud too complicated? ZDNet (July 2008)Google Scholar
- 13.Nau, D.S.: Current trends in automated planning. AI Magazine 28(4), 43 (2007)Google Scholar
- 15.Gopisetty, S., et al.: Automated Planner for Storage Provisioning and Disaster Recovery. IBM Journal of Research and Development 52(4/5) (2008)Google Scholar