Skip to main content

A Comparative Study at the Logical Level of Centralised and Distributed Recovery in Clusters

  • Conference paper
Distributed and Parallel Computing (ICA3PP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3719))

  • 558 Accesses

Abstract

Cluster systems are becoming more prevalent in today’s computer society and users are beginning to request that these systems be reliable. Currently, most clusters have been designed to provide high performance at the cost of providing little to no reliability. To combat this, this report looks at how a recovery facility, based on either a centralised or distributed approach could be implemented into a cluster that is supported by a checkpointing facility. This recovery facility can then recover failed user processes by using checkpoints of the processes that have been taken during failure free execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goscinski, A.: Towards A Cluster Operating System That Offers A Single System Image. In: Distributed and Parallel Systems (2002)

    Google Scholar 

  2. Maloney, A.: Checkpointing and Rollback-Recovery Mechanisms to Provide Fault Tolerance for Parallel Applications. School of Information Technology, Deakin University (2004), http://www-development.deakin.edu.au/scitech/sit/dsapp/members/index.php

  3. Elnozahy, M., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. School of Computer Schience at Carnegie Mellon University, Pittsburgh, PA 15213 (1999)

    Google Scholar 

  4. Badrinath, R., Morin, C., Vallée, G.: Checkpointing and Recovery of Shared Memory Parallel Applications in a Cluster. In: Proc. Intl. Workshop on Distributed Shared Memory on Clusters (DSM 2003), Tokyo, May 2003, pp. 471–477 (2003)

    Google Scholar 

  5. Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent Checkpointing under Unix. In: Proceedings of the USENIX Winter 1995 Technical Conference, pp. 213–223 (1995)

    Google Scholar 

  6. Landau, C.R.: The Checkpoint Mechanism in KeyKOS. In: Proceedings of the Second International Workshop on Object Orientation in Operating Systems (September 1992)

    Google Scholar 

  7. Rough, J., Goscinski, A.: The development of an efficient checkpointing facility exploiting operating systems services of the GENESIS cluster operating system. Future Generation Computer Systems 20, 523–538 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maloney, A., Goscinski, A. (2005). A Comparative Study at the Logical Level of Centralised and Distributed Recovery in Clusters. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds) Distributed and Parallel Computing. ICA3PP 2005. Lecture Notes in Computer Science, vol 3719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564621_13

Download citation

  • DOI: https://doi.org/10.1007/11564621_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29235-7

  • Online ISBN: 978-3-540-32071-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics