Group Communication: From Practice to Theory

Schiper, André

doi:10.1007/11611257_10

Group Communication: From Practice to Theory

André Schiper²⁰

Conference paper

846 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3831))

Abstract

Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented.

The same paper will appear under the title Dependable Systems in Dependable Information and Communication Systems, to be published in the Springer LNCS series, 2006. Research supported by the Hasler Stiftung under grant number DICS-1825.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguilera, M.K., Chen, W., Toueg, S.: Heartbeat: a Timeout-Free Failure Detector for Quiescent Reliable Communication. In: Mavronicolas, M. (ed.) WDAG 1997. LNCS, vol. 1320, pp. 126–140. Springer, Heidelberg (1997)
Chapter Google Scholar
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Thrifty Generic Broadcast. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, p. 268. Springer, Heidelberg (2000)
Chapter Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Distributed Database Systems. Addison-Wesley, Reading (1987)
Google Scholar
Birman, K., Joseph, T.: Reliable Communication in the Presence of Failures. ACM Trans. on Computer Systems 5(1), 47–76 (1987)
Article Google Scholar
Chandra, T.D., Hadzilacos, V., Toueg, S.: The Weakest Failure Detector for Solving Consensus. Journal of ACM 43(4), 685–722 (1996)
Article MATH MathSciNet Google Scholar
Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of ACM 43(2), 225–267 (1996)
Article MATH MathSciNet Google Scholar
Chockler, G.V., Keidar, I., Vitenberg, R.: Group Communication Specifications: A Comprehensive Study. ACM Computing Surveys 4(33), 1–43 (2001)
Google Scholar
Défago, X., Schiper, A., Urban, P.: Totally Ordered Broadcast and Multicast Algorithms: Taxonomy and Survey. ACM Computing Surveys 4(36), 1–50 (2004)
Google Scholar
Dolev, D., Dwork, C., Stockmeyer, L.: On the Minimal Synchrony Needed for Distributed Consensus. Journal of ACM 34(1), 77–97 (1987)
Article MATH MathSciNet Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the Presence of Partial Synchrony. Journal of ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Ekwall, R., Schiper, A.: Replication: Understanding the Advantage of Atomic Broadcast over Quorum Systems. Journal of Universal Computer Science 11(5), 703–711 (2005)
Google Scholar
Elnozahy, E.N., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)
Article Google Scholar
Fischer, M., Lynch, N., Paterson, M.: Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM 32, 374–382 (1985)
Article MATH MathSciNet Google Scholar
Guerraoui, R., Larrea, M., Schiper, A.: Reducing the Cost for Non-Blocking in Atomic Commitment. In: IEEE 16th Intl. Conf. Distributed Computing Systems, May 1996, pp. 692–697 (1996)
Google Scholar
Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Technical Report 94-1425, Department of Computer Science, Cornell University (May 1994)
Google Scholar
Herlihy, M., Wing, J.: Linearizability: a Correctness Condition for Concurrent Objects. ACM Trans. on Progr. Languages and Syst. 12(3), 463–492 (1990)
Article Google Scholar
Hermant, J.-F., Le Lann, G.: Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems. IEEE Transactions on Computers 51(8), 931–944 (2002)
Article Google Scholar
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM 21(7), 558–565 (1978)
Article MATH Google Scholar
Lamport, L.: How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Trans. on Computers C28(9), 690–691 (1979)
Article MATH Google Scholar
Lamport, L.: The Part-Time Parliament. TR 49, Digital SRC (September 1989)
Google Scholar
Lamport, L.: The Part-Time Parliament. ACM Trans. on Computer Systems 16(2), 133–169 (1998)
Article Google Scholar
Laprie, J.C. (ed.): Dependability: Basic Concepts and Terminology. Springer, Heidelberg (1992)
MATH Google Scholar
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
MATH Google Scholar
Misra, J.: Axioms for Memory Access in Asynchronous Hardware Systems. ACM Trans. on Progr. Languages and Syst. 8(1), 142–153 (1986)
Article MATH Google Scholar
Pedone, F., Schiper, A.: Handling Message Semanticas with Generic Broadcast Protocols. Distributed Computing 15(2), 97–107 (2002)
Article Google Scholar
Schiper, A.: Dynamic Group Communication. TR IC/2003/27, EPFL. To appear in ACM Distributed Computing (April 2003)
Google Scholar
Schiper, A., Toueg, S.: From Set Membership to Group Membership: A Separation of Concerns. TR IC/2003/56, EPFL - IC (September 2003)
Google Scholar
Schneider, F.B.: Implementing Fault Tolerant Services Using the State Machine Approach: A Tutorial. Computing Surveys 22(4) (December 1990)
Google Scholar
Skeen, D.: Nonblocking Commit Protocols. In: ACM SIGMOD Intl. Conf. on Management of Data, pp. 133–142 (1981)
Google Scholar
Urbán, P., Shnayderman, I., Schiper, A.: Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms. In: Proc. Int’l. Conf. on Dependable Systems and Networks, San Francisco, CA, USA, June 2003, pp. 645–654 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
André Schiper

Authors

André Schiper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07, Prague 8, Czech Republic
Jiří Wiedermann & Július Štuller &
Department of Information and Computer Sciences, University of Utrecht, P.O. Box 80.089, 3508, Utrecht, TB, The Netherlands
Gerard Tel
Faculty of Mathematics and Physics, Charles University, Prague
Jaroslav Pokorný
Institute of Informatics and Software Engineering Faculty of Informatics and Information technologies, Slovak University of Technology, Ilkovičova 3, 842 16, Bratislava
Mária Bieliková

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schiper, A. (2006). Group Communication: From Practice to Theory. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds) SOFSEM 2006: Theory and Practice of Computer Science. SOFSEM 2006. Lecture Notes in Computer Science, vol 3831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11611257_10

Download citation

DOI: https://doi.org/10.1007/11611257_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31198-0
Online ISBN: 978-3-540-32217-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics