Advertisement

Failure Recovery in Distributed Environments with Advance Reservation Management Systems

  • Lars-Olof Burchard
  • Barry Linnert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3278)

Abstract

Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid toolkits such as Globus support advance reservations and assign jobs to resources at admission time. While the allocation mechanisms for advance reservations are available in current grid management systems, in case of failures the advance reservation perspective demands for strategies that support more than recovery of jobs or applications that are active at the time the resource failure occurs. Instead, also already admitted, but not yet started applications are affected by the failure and hence, need to be dealt with in an appropriate manner. In this paper, we discuss the properties of advance reservations with respect to failure recovery and outline a number of strategies applicable in such cases in order to reduce the impact of resource failures and outages. It can be shown that it pays to remap also affected but not yet started jobs to alternative resources if available. Alike reserving in advance, this can be considered as remapping in advance. In particular, a remapping strategy that prefers requests that were allocated a long time ago, provides a high fairness for clients as it implements similar functionality as advance reservations, while achieving the same performance as the other strategies.

Keywords

Service Level Agreement Grid Environment Alternative Resource Failure Recovery Advance Reservation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Azzedin, F., Maheswaran, M., Arnason, N.: A Synchronous Co-Allocation Mechanism for Grid Computing Systems. Journal on Cluster Computing 7(1), 39–49 (2004)CrossRefGoogle Scholar
  2. 2.
    Burchard, L.-O., Droste-Franke, M.: Fault Tolerance in Networks with an Advance Reservation Service. In: Jeffay, K., Stoica, I., Wehrle, K. (eds.) IWQoS 2003. LNCS, vol. 2707, pp. 215–228. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Burchard, L.-O., Hovestadt, M., Kao, O., Keller, A., Linnert, B.: The Virtual Resource Manager: An Architecture for SLA-aware Resource Management. In: 4th Intl. IEEE/ACM Intl. Symposium on Cluster Computing and the Grid (CCGrid), Chicago, USA (2004)Google Scholar
  4. 4.
    Foster, I., Kesselman, C., Lee, C., Lindell, R., Nahrstedt, K., Roy, A.: A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. In: 7th International Workshop on Quality of Service (IWQoS), London, UK, pp. 27–36 (1999)Google Scholar
  5. 5.
    Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)zbMATHGoogle Scholar
  6. 6.
    The Globus Project, http://www.globus.org/
  7. 7.
    Hwang, S., Kesselman, C.: Grid Workflow: A Flexible Failure Handling Framework for the Grid. In: 12th Intl. Symposium on High Performance Distributed computing (HPDC), Seattle, USA, pp. 126–138. IEEE, Los Alamitos (2003)Google Scholar
  8. 8.
    Karp, R., Luby, M., Marchetti-Spaccamela, A.: A Probabilistic Analysis of Multidimensional Bin Packing Problems. In: 16th annual ACM Symposium on Theory of Computing (STOC), pp. 289–298. ACM Press, New York (1984)Google Scholar
  9. 9.
    Lo, V., Mache, J., Windisch, K.: A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 25–46. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Raman, R., Livny, M., Solomon, M.: Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching. In: 12th Intl. Symposium on High Performance Distributed Computing (HPDC), Seattle, USA, pp. 80–90. IEEE, Los Alamitos (2003)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2004

Authors and Affiliations

  • Lars-Olof Burchard
    • 1
  • Barry Linnert
    • 1
  1. 1.Technische Universitaet BerlinGermany

Personalised recommendations