A Novel Recovery Approach for Cluster Federations

Gupta, Bidyut; Rahimi, Shahram; Ahmad, Raheel; Chirra, Raja

doi:10.1007/978-3-540-72360-8_44

Bidyut Gupta¹,
Shahram Rahimi¹,
Raheel Ahmad¹ &
…
Raja Chirra¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4459))

Included in the following conference series:

International Conference on Grid and Pervasive Computing

802 Accesses
3 Citations

Abstract

In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and have proposed a fast recovery algorithm to handle failures in cluster federations. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Besides, the number of trips to the stable storage necessary for executing the algorithm is much less compared to the same in some existing works. Also the proposed algorithm does not suffer from any message storm unlike some noted work in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cao, J., et al.: Checkpointing in Hybrid Distributed Systems. In: Proc. of the 7th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’04), Hong Kong, China, pp. 136–141 (2004)
Google Scholar
Monnet, S., Morin, C., Badrinath, R.: Hybrid Checkpointing for Parallel Applications in cluster Federations. In: 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, Chicago, IL, USA, pp. 773–782 (2004)
Google Scholar
Gupta, B., et al.: A Low-Overhead Non-Blocking Checkpointing Algorithm for Mobile Computing Environment. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 597–608. Springer, Heidelberg (2006)
Chapter Google Scholar
Koo, R., Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems. IEEE trans. Software Engineering 13(1), 23–31 (1987)
Article MATH Google Scholar
Wang, Y.: Consistent Global Checkpoints that contain a Given Set of Local Checkpoints. IEEE trans. Computers 46(4), 456–468 (1997)
Article Google Scholar
Tsai, J., Kuo, S.-Y., Wang, Y.-M.: Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback Dependency Trackability. IEEE Trans. Parallel and Distributed Systems 9(10), 963–971 (1998)
Article Google Scholar
Gupta, B., Banerjee, S.K., Liu, B.: Design of new roll-forward recovery approach for distributed systems. IEEE Proc. Computers and Digital Techniques 149(3), 105–112 (2002)
Article Google Scholar
Manivannan, D., Singhal, M.: Asynchronous recovery without using vector timestamps. Journal of Parallel and Distributed Computing 62, 1695–1728 (2002)
Article MATH Google Scholar
Xin Qi, G., Parmer, R.: West: An efficient end-host architecture for cluster communication. In: Proc. 2004 IEEE Intl. Conf. on Cluster Computing, San Diego, California, September 20-23, 2004, pp. 83–92 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Southern Illinois University, Carbondale IL 62901, USA
Bidyut Gupta, Shahram Rahimi, Raheel Ahmad & Raja Chirra

Authors

Bidyut Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Shahram Rahimi
View author publications
You can also search for this author in PubMed Google Scholar
Raheel Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Raja Chirra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christophe Cérin Kuan-Ching Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, B., Rahimi, S., Ahmad, R., Chirra, R. (2007). A Novel Recovery Approach for Cluster Federations. In: Cérin, C., Li, KC. (eds) Advances in Grid and Pervasive Computing. GPC 2007. Lecture Notes in Computer Science, vol 4459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72360-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-72360-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72359-2
Online ISBN: 978-3-540-72360-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics