Abstract
In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and have proposed a fast recovery algorithm to handle failures in cluster federations. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Besides, the number of trips to the stable storage necessary for executing the algorithm is much less compared to the same in some existing works. Also the proposed algorithm does not suffer from any message storm unlike some noted work in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cao, J., et al.: Checkpointing in Hybrid Distributed Systems. In: Proc. of the 7th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’04), Hong Kong, China, pp. 136–141 (2004)
Monnet, S., Morin, C., Badrinath, R.: Hybrid Checkpointing for Parallel Applications in cluster Federations. In: 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, Chicago, IL, USA, pp. 773–782 (2004)
Gupta, B., et al.: A Low-Overhead Non-Blocking Checkpointing Algorithm for Mobile Computing Environment. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 597–608. Springer, Heidelberg (2006)
Koo, R., Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems. IEEE trans. Software Engineering 13(1), 23–31 (1987)
Wang, Y.: Consistent Global Checkpoints that contain a Given Set of Local Checkpoints. IEEE trans. Computers 46(4), 456–468 (1997)
Tsai, J., Kuo, S.-Y., Wang, Y.-M.: Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback Dependency Trackability. IEEE Trans. Parallel and Distributed Systems 9(10), 963–971 (1998)
Gupta, B., Banerjee, S.K., Liu, B.: Design of new roll-forward recovery approach for distributed systems. IEEE Proc. Computers and Digital Techniques 149(3), 105–112 (2002)
Manivannan, D., Singhal, M.: Asynchronous recovery without using vector timestamps. Journal of Parallel and Distributed Computing 62, 1695–1728 (2002)
Xin Qi, G., Parmer, R.: West: An efficient end-host architecture for cluster communication. In: Proc. 2004 IEEE Intl. Conf. on Cluster Computing, San Diego, California, September 20-23, 2004, pp. 83–92 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Gupta, B., Rahimi, S., Ahmad, R., Chirra, R. (2007). A Novel Recovery Approach for Cluster Federations. In: Cérin, C., Li, KC. (eds) Advances in Grid and Pervasive Computing. GPC 2007. Lecture Notes in Computer Science, vol 4459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72360-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-72360-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72359-2
Online ISBN: 978-3-540-72360-8
eBook Packages: Computer ScienceComputer Science (R0)