Skip to main content

Group Communication: From Practice to Theory

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3831))

Abstract

Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented.

The same paper will appear under the title Dependable Systems in Dependable Information and Communication Systems, to be published in the Springer LNCS series, 2006. Research supported by the Hasler Stiftung under grant number DICS-1825.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilera, M.K., Chen, W., Toueg, S.: Heartbeat: a Timeout-Free Failure Detector for Quiescent Reliable Communication. In: Mavronicolas, M. (ed.) WDAG 1997. LNCS, vol. 1320, pp. 126–140. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  2. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Thrifty Generic Broadcast. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, p. 268. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Distributed Database Systems. Addison-Wesley, Reading (1987)

    Google Scholar 

  4. Birman, K., Joseph, T.: Reliable Communication in the Presence of Failures. ACM Trans. on Computer Systems 5(1), 47–76 (1987)

    Article  Google Scholar 

  5. Chandra, T.D., Hadzilacos, V., Toueg, S.: The Weakest Failure Detector for Solving Consensus. Journal of ACM 43(4), 685–722 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of ACM 43(2), 225–267 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  7. Chockler, G.V., Keidar, I., Vitenberg, R.: Group Communication Specifications: A Comprehensive Study. ACM Computing Surveys 4(33), 1–43 (2001)

    Google Scholar 

  8. Défago, X., Schiper, A., Urban, P.: Totally Ordered Broadcast and Multicast Algorithms: Taxonomy and Survey. ACM Computing Surveys 4(36), 1–50 (2004)

    Google Scholar 

  9. Dolev, D., Dwork, C., Stockmeyer, L.: On the Minimal Synchrony Needed for Distributed Consensus. Journal of ACM 34(1), 77–97 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the Presence of Partial Synchrony. Journal of ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  11. Ekwall, R., Schiper, A.: Replication: Understanding the Advantage of Atomic Broadcast over Quorum Systems. Journal of Universal Computer Science 11(5), 703–711 (2005)

    Google Scholar 

  12. Elnozahy, E.N., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)

    Article  Google Scholar 

  13. Fischer, M., Lynch, N., Paterson, M.: Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM 32, 374–382 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  14. Guerraoui, R., Larrea, M., Schiper, A.: Reducing the Cost for Non-Blocking in Atomic Commitment. In: IEEE 16th Intl. Conf. Distributed Computing Systems, May 1996, pp. 692–697 (1996)

    Google Scholar 

  15. Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Technical Report 94-1425, Department of Computer Science, Cornell University (May 1994)

    Google Scholar 

  16. Herlihy, M., Wing, J.: Linearizability: a Correctness Condition for Concurrent Objects. ACM Trans. on Progr. Languages and Syst. 12(3), 463–492 (1990)

    Article  Google Scholar 

  17. Hermant, J.-F., Le Lann, G.: Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems. IEEE Transactions on Computers 51(8), 931–944 (2002)

    Article  Google Scholar 

  18. Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM 21(7), 558–565 (1978)

    Article  MATH  Google Scholar 

  19. Lamport, L.: How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Trans. on Computers C28(9), 690–691 (1979)

    Article  MATH  Google Scholar 

  20. Lamport, L.: The Part-Time Parliament. TR 49, Digital SRC (September 1989)

    Google Scholar 

  21. Lamport, L.: The Part-Time Parliament. ACM Trans. on Computer Systems 16(2), 133–169 (1998)

    Article  Google Scholar 

  22. Laprie, J.C. (ed.): Dependability: Basic Concepts and Terminology. Springer, Heidelberg (1992)

    MATH  Google Scholar 

  23. Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)

    MATH  Google Scholar 

  24. Misra, J.: Axioms for Memory Access in Asynchronous Hardware Systems. ACM Trans. on Progr. Languages and Syst. 8(1), 142–153 (1986)

    Article  MATH  Google Scholar 

  25. Pedone, F., Schiper, A.: Handling Message Semanticas with Generic Broadcast Protocols. Distributed Computing 15(2), 97–107 (2002)

    Article  Google Scholar 

  26. Schiper, A.: Dynamic Group Communication. TR IC/2003/27, EPFL. To appear in ACM Distributed Computing (April 2003)

    Google Scholar 

  27. Schiper, A., Toueg, S.: From Set Membership to Group Membership: A Separation of Concerns. TR IC/2003/56, EPFL - IC (September 2003)

    Google Scholar 

  28. Schneider, F.B.: Implementing Fault Tolerant Services Using the State Machine Approach: A Tutorial. Computing Surveys 22(4) (December 1990)

    Google Scholar 

  29. Skeen, D.: Nonblocking Commit Protocols. In: ACM SIGMOD Intl. Conf. on Management of Data, pp. 133–142 (1981)

    Google Scholar 

  30. Urbán, P., Shnayderman, I., Schiper, A.: Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms. In: Proc. Int’l. Conf. on Dependable Systems and Networks, San Francisco, CA, USA, June 2003, pp. 645–654 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schiper, A. (2006). Group Communication: From Practice to Theory. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds) SOFSEM 2006: Theory and Practice of Computer Science. SOFSEM 2006. Lecture Notes in Computer Science, vol 3831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11611257_10

Download citation

  • DOI: https://doi.org/10.1007/11611257_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31198-0

  • Online ISBN: 978-3-540-32217-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics