Skip to main content
Log in

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Group communication services (GCSs) are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world. However, it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network (DN). While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network, they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures, which hinders the service practicability. In this paper, we present two schemes to address the challenges. The first one is a location-aware replication scheme called NS, which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns (e.g., network partition and single point failure) while keeping replication overhead low. The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery. Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate, recovery latency, and communication cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chu Y, Rao S G, Seshan S, Zhang H. A case for end system multicast. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1456–1471.

    Article  Google Scholar 

  2. Castro M, Druschel P, Kermarrec A M, Rowstron A I T. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1489–1499.

    Article  Google Scholar 

  3. Chawathe Y. Scattercast: An adaptable broadcast distribution framework. Multimedia Systems, 2003, 9(1): 104–118.

    Article  Google Scholar 

  4. Francis P. Yoid: Extending the internet multicast architecture. http://www.aciri.org/yoid/docs/index.html, 2000.

  5. Banerjee S, Bhattacharjee B, Kommareddy C. Scalable application layer multicast. In Proc. SIGCOMM 2002, Pittsburgh, USA, Aug. 19-23, 2002, pp.205–217.

  6. Banerjee S, Kommareddy C, Kar K, Bhattacharjee B, Khuller S. OMNI: An efficient overlay multicast infrastructure for real-time applications. Computer Networks, 2006, 50(6): 826–841.

    Article  Google Scholar 

  7. Jannotti J, Gifford D, Johnson K, Kaashoek M et al. Over-cast: Reliable multicasting with on overlay network. In Proc. OSDI 2000, San Diego, USA, Oct. 23-25, 2000, pp.197–212.

  8. Zhang J, Liu L, Ramaswamy L, Pu C. PeerCast: Churnresilient end system multicast on heterogeneous overlay networks. Journal of Network and Computer Applications, 2008, 31(4): 821–850.

    Article  Google Scholar 

  9. Castro M, Druschel P, Kermarrec A, Nandi A, Rowstron A, Singh A. SplitStream: High-bandwidth multicast in cooperative environments. In Proc. SOSP 2003, Bolton Landing, USA, Oct. 19-22, 2003, pp.298–313.

  10. Kostić D, Rodriguez A, Albrecht J, Vahdat A. Bullet: High bandwidth data dissemination using an overlay mesh. ACM SIGOPS Operating Systems Review, 2003, 37(5): 282–297.

    Article  Google Scholar 

  11. Zhang X, Liu J, Li B, Yum T. CoolStreaming/DONet: A data-driven overlay network for efficient live media streaming. In Proc. INFOCOM 2005, Miami, USA, Mar. 13-17, 2005, pp.13–17.

  12. Pai V, Kumar K, Tamilmani K, Sambamurthy V, Mohr A. Chainsaw: Eliminating trees from overlay multicast. Peer-to-peer systems IV, 2005, pp.127–140.

  13. Tran D A, Hua K A, Do T. Zigzag: An efficient peer-to-peer scheme for media streaming. In Proc. INFOCOM 2003, San Franciso, USA, Mar. 30-Apr. 3, 2003, pp.1283–1292.

  14. Gu X, Nahrstedt K, Yu B. SpiderNet: An integrated peer-to-peer service composition framework. In Proc. HPDC 2004, Honolulu, Hawaii, USA, Jun. 4-6, 2004, pp.110–119.

  15. Wang Y, Liu L, Pu C, Zhang G. GeoCast: An efficient overlay system for multicast applications. Technical Report, Georgia Institute of Technology, 2009, http://www.cercs.gatech.edu/tech-reports/tr2009/git-cercs-09-16.pdf.

  16. Wen C, Wu C, Yang M. Hybrid tree based explicit routed multicast for QoS supported IPTV service. In Proc. GLOBE-COM 2009, Honolulu, Hawaii, USA, Nov. 30-Dec. 4, 2009, pp.1–6.

  17. Fei A, Cui J, Gerla M, Cavendish D. A “dual-tree” scheme for fault-tolerant multicast. In Proc. ICC 2001, Helsinki, Finland, Jun. 11-14, 2001, pp.690–694.

  18. Banerjee S, Lee S, Bhattacharjee B, Srinivasan A. Resilient multicast using overlays. ACM SIGMETRICS Performance Evaluation Review, 2003, 31(1): 102–113.

    Article  Google Scholar 

  19. Gopalakrishnan V, Silaghi B, Bhattacharjee B, Keleher P. Adaptive replication in peer-to-peer systems. In Proc. ICDCS 2004, Tokyo, Japan, Mar. 23-26, 2004, pp.360–369.

  20. Yoshinaga H, Tsuchiya T, Sawano H, Koyanagi K. A study on scalable object replication method for the distributed cooperative storage system. In Proc. ICDT 2009, Colmar, France, Jul. 20-25, 2009, pp.96–101.

  21. Sandhu H S, Zhou S. Cluster-based file replication in large-scale distributed systems. ACM SIGMETRICS Performance Evaluation Review, 1992, 20(1): 91–102.

    Article  Google Scholar 

  22. Shen H, Zhu Y. A proactive low-overhead file replication scheme for structured P2P content delivery networks. Journal of Parallel and Distributed Computing, 2009, 69(5): 429–440.

    Article  Google Scholar 

  23. Tirado J M, Higuero D, Isaila F, Carretero J, Iamnitchi A. Affinity P2P: A self-organizing content-based locality-aware collaborative peer-to-peer network. Computer Networks, 2010, 54(12): 2056–2070.

    Article  MATH  Google Scholar 

  24. Ho C, Lee S, Yu J. Cluster-based replication for P2P-based video-on-demand service. In Proc. ICEIE 2010, Kyoto, Japan, Aug. 1-3, 2010, pp.49–53.

  25. Zhao K, Niu Z, Zhao Y, Yang J. Search with index replication in power-law like peer-to-peer networks. In Proc. ICCET 2010, Chengdu, China, Apr. 16-18, 2010, pp.334–338.

  26. Zhang J, Liu L, Pu C, Ammar M. Reliable peer-to-peer end system multicasting through replication. In Proc. P2P 2004, Zurich, Switzerland, Aug. 25-27, 2004, pp.235–242.

  27. Ratnasamy S, Francis P, Handley M, Karp R, Schenker S. A scalable content-addressable network. In Proc. SIGCOMM 2001, San Diego, USA, Aug. 27-31, 2001, pp.161–172.

  28. Yamamoto H, Maruta D, Oie Y. Replication methods for load balancing on distributed storages in P2P networks. IEICE Transactions, 2006, E89-D(1): 171–180.

    Google Scholar 

  29. Kalogeraki V, Gunopulos D, Zeinalipour-Yazti D. A local search mechanism for peer-to-peer networks. In Proc. CIKM 2002, Nov. 5-8, 2002, pp.300–307.

  30. Ganesan P, Bawa M, Garcia-Molina H. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proc. VLDB 2004, Toronto, Canada, Aug. 31-Sep. 3, 2004, pp.444–455.

  31. Sato H, Matsuoka S, Endo T, Maruyama N. Access-pattern and bandwidth aware file replication algorithm in a grid environment. In Proc. Grid 2008, Tsukuba, Japan, Sep. 29-Oct. 1, 2008, pp.250–257.

  32. Chang T, Ahamad M. Improving service performance through object replication in middleware: A peer-to-peer approach. In Proc. P2P 2005, Konstanz, Germany, Aug. 31-Sep. 2, 2005, pp.245–252.

  33. Lv Q, Cao P, Cohen E, Li K, Shenker S. Search and replication in unstructured peer-to-peer networks. In Proc. ICS 2002, New York, USA, Jun. 22-26, 2002, pp.84–95.

  34. Liu Y, Liu X, Xiao L, Ni L, Zhang X. Location-aware topology matching in P2P systems. In Proc. INFOCOM 2004, Hong Kong, China, Mar. 7-11, 2004, pp.2220–2230.

  35. Falchi F, Gennaro C, Zezula P. A content-addressable network for similarity search in metric spaces. In Proc. DBISP2P 2005/2006, Trondheim, Norway, Aug. 28-29, 2005, pp.98–110.

  36. Sahin O D, Gupta A, Agrawal D, El Abbadi A. A peer-to-peer framework for caching range queries. In Proc. ICDE 2004, Boston, USA, Mar. 30-Apr. 2, 2004, pp.165–176.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue-Hua Wang.

Additional information

This work is partially supported by National Science Foundation (NSF) grant from CISE NetSE Program and CyberTrust Cross-Cutting Program of USA, IBM faculty award, IBM SUR grant, grant from Intel Research Council, the National Basic Research 973 Program of China under Grant No. 2009CB320805, the National Natural Science Foundation of China under Grant No. 61170188, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011803, and Fundamental Research Funds for the Central Universities of China. The first author was supported by China Scholarship Council (CSC) and performed part of the work as a visiting Ph.D. candidate in 2007 » 2009 at the Distributed Data intensive Systems Lab (DiSL) in Georgia Institute of Technology.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 87.6 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, YH., Zhou, Z., Liu, L. et al. Fault Tolerance and Recovery for Group Communication Services in Distributed Networks. J. Comput. Sci. Technol. 27, 298–312 (2012). https://doi.org/10.1007/s11390-012-1224-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1224-1

Keywords

Navigation