Group Communication Systems

Birman, Kenneth P.

doi:10.1007/978-1-4471-2416-0_12

Kenneth P. Birman²

Part of the book series: Texts in Computer Science ((TCS))

3161 Accesses
2 Citations

Abstract

Our goal in Chap. 12 is to identify the best options for implementing high-speed data replication and other tools needed for fault-tolerant, highly assured Web Services and other forms of distributed computing. Given the GMS created in Chap. 11, one option would be to plunge right in and build replicated applications using the protocol directly in the application. The approach builds on the GMS, but then uses it to create protocols that can only be operated under the assumption that if a failure occurs, the GMS will be notified and will reconfigure the system appropriately, notifying the new system configuration members of their new state, and taking steps to shut down any old members that are unreachable but later recover. We arrive at a rich collection of protocols and establish a subtle linkage to the Paxos framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Most ordered of all is the flush protocol used to install new views: This delivers a type of message (the new view) in a way that is ordered with respect to all other types of message. In the Isis Toolkit, there was actually a SafeSend primitive, which could be used to obtain this behavior at the request of the user, but it was rarely used and more recent systems tend to use this protocol only to install new process group views.

References

Agarwal, D.A.: Totem: A reliable ordered delivery protocol for interconnected local area networks. Ph.D. diss., Department of Electrical and Computer Engineering, University of California, Santa Barbara (1994)
Google Scholar
Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Membership algorithms in broadcast domains. In: Proceedings of the Sixth WDAG, Israel, June 1992. Lecture Notes in Computer Science, vol. 647, pp. 292–312. Springer, Berlin (1992a)
Google Scholar
Anceaume, E., Charron-Bost, B., Minet, P., Toueg, S.: On the formal specification of group membership services. Technical Report 95-1534, Department of Computer Science, Cornell University, August (1995)
Google Scholar
Babaoglu, O., Marzullo, K.: Consistent global states of distributed systems: Fundamental concepts and mechanisms. In: Mullender, S.J. (ed.) Distributed Systems, 2nd edn. Addison-Wesley/ACM Press, Reading (1993)
Google Scholar
Babaoglu, O., Davoli, R., Giachini, L.A., Baker, M.B.: RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project Deliverable Report, Department of Computing Science, University of Newcastle upon Tyne, United Kingdom (1994)
Google Scholar
Babaoglu, O., Davoli, R., Montresor, A.: Failure detectors, group membership, and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-19, Department of Computer Science, University of Bologna, November (1995)
Google Scholar
Ben-Or, M.: Fast asynchronous byzantine agreement. In: Proceedings of the Fourth ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985, pp. 149–151 (1985)
Chapter Google Scholar
Bernstein, P.E., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Google Scholar
Birman, K.P.: A response to Cheriton and Skeen’s criticism of causal and totally ordered communication. Oper. Syst. Rev. 28(1), 11–21 (1994)
Article Google Scholar
Birman, K.P., Glade, B.B.: Consistent failure reporting in reliable communications systems. IEEE Softw., Special Issue on Reliability (1995)
Google Scholar
Birman, K.P., Joseph, T.A.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, November 1987, pp. 123–138. ACM Press, New York (1987a)
Chapter Google Scholar
Birman, K.P., Joseph, T.A.: Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5(1), 47–76 (1987b)
Article Google Scholar
Birman, K.P., Schiper, A., Stephenson, P.: Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst. 9(3), 272–314 (1991)
Article Google Scholar
Birman, K., Cantwell, J., Freedman, D., Huang, Q., Nikolov, P., Ostrowski, K.: Edge mashups for service-oriented collaboration. IEEE Comput. 42(5) (2010)
Google Scholar
Chandra, T., Toueg, S.: Unreliable failure detectors for asynchronous systems. J. ACM (in press). Previous version in ACM Symposium on Principles of Distributed Computing (Montreal, 1991), pp. 325–340
Google Scholar
Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: ACM Symposium on Principles of Distributed Computing, August 1992, pp. 147–158 (1992)
Google Scholar
Chandra, T., Hadzilacos, V., Toueg, S., Charron-Bost, B.: On the impossibility of group membership. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, Vancouver, May 1996
Google Scholar
Chandy, K.M., Lamport, L.: Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)
Article Google Scholar
Chang, M., Maxemchuk, N.: Reliable broadcast protocols. ACM Trans. Comput. Syst. 2(3), 251–273 (1984)
Article Google Scholar
Charron-Bost, B.: Concerning the size of logical clocks in distributed systems. Inf. Process. Lett. 39(1), 11–16 (1991)
Article MathSciNet MATH Google Scholar
Cheriton, D., Skeen, D.: Understanding the limitations of causally and totally ordered communication. In: Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, Asheville, NC, December 1993, pp. 44–57. ACM Press, New York (1993)
Chapter Google Scholar
Coan, B., Thomas, G.: Agreeing on a leader in real time. In: Proceedings of the Eleventh Real-Time Systems Symposium, December 1990, pp. 166–172 (1990)
Chapter Google Scholar
Coan, B., Oki, B.M., Kolodner, E.K.: Limitations on database availability when networks partition. In: Proceedings of the Fifth ACM Symposium on Principles of Distributed Computing, Calgary, August 1986, pp. 187–194 (1986)
Chapter Google Scholar
Cooper, E.: Replicated distributed programs. In: Proceedings of the Tenth ACM Symposium on Operating Systems Principles, Orcas Island, WA, December 1985, pp. 63–78. ACM Press, New York (1985)
Chapter Google Scholar
Cooper, R.: Experience with causally and totally ordered group communication support—A cautionary tale. Oper. Syst. Rev. 28(1), 28–32 (1994)
Article Google Scholar
Cristian, F.: Reaching agreement on processor group membership in synchronous distributed systems. Distrib. Comput. 4(4), 175–187 (1991a)
Article MATH Google Scholar
Cristian, F., Schmuck, F.: Agreeing on process group membership in asynchronous distributed systems. Technical Report CSE95-428, Department of Computer Science and Engineering, University of California, San Diego (1995)
Google Scholar
Cristian, F., Aghili, H., Strong, R., Dolev, D.: Atomic broadcast: From simple message diffusion to byzantine agreement. In: Proceedings of the Fifteenth International Symposium on Fault-Tolerant Computing, pp. 200–206. IEEE Computer Society Press, New York (1985). Revised as IBM Technical Report RJ5244
Google Scholar
Cristian, F., Dolev, D., Strong, R., Aghili, H.: Atomic broadcast in a real-time environment. In: Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol. 448, pp. 51–71. Springer, Berlin (1990)
Chapter Google Scholar
Fidge, C.: Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the Eleventh Australian Computer Science Conference (1988)
Google Scholar
Fisher, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: Proceedings of the Fourth Annual ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985. ACM Press, New York (1985a)
Google Scholar
Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed computing with one faulty process. J. ACM 32(2), 374–382 (1985b)
Article Google Scholar
Friedman, R., Keider, I., Malkhi, D., Birman, K.P., Dolev, D.: Deciding in partitionable networks. Technical Report 95-1554, Department of Computer Science, Cornell University, October (1995)
Google Scholar
Gifford, D.: Weighted voting for replicated data. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles, Pacific Grove, CA, December 1979, pp. 150–162. ACM Press, New York (1979)
Google Scholar
Golding, R.A.: Weak consistency group communication and membership. Ph.D. diss., Computer and Information Sciences Department, University of California, Santa Cruz (1992)
Google Scholar
Gray, J.: Notes on database operating systems. In: Operating Systems: An Advanced Course. Lecture Notes in Computer Science, vol. 60, pp. 393–481. Springer, Berlin (1978)
Google Scholar
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo (1993)
MATH Google Scholar
Guerraoui, R., Schiper, A.: Gamma-accurate failure detectors. Technical Report APFL, Lausanne, Switzerland: Départment d’Informatique (1996)
Google Scholar
Kaashoek, F.: Group communication in distributed computer systems. Ph.D. diss., Vrije Universiteit (1992)
Google Scholar
Ladin, R., Liskov, B., Shrira, L., Ghemawat, S.: Providing availability using lazy replication. ACM Trans. Comput. Syst. 10(4), 360–391 (1992)
Article Google Scholar
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978a)
Article MATH Google Scholar
Lamport, L.: Using time instead of timeout for fault-tolerant distributed systems. ACM Trans. Program. Lang. Syst. 6(2), 254–280 (1984)
Article Google Scholar
Liskov, B., et al.: Replication in the Harp file system. In: Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, Asilomar, CA, October 1991, pp. 226–238. ACM Press, New York (1991)
Chapter Google Scholar
Mattern, F.: Time and global states in distributed systems. In: Proceedings of the International Workshop on Parallel and Distributed Algorithms. North-Holland, Amsterdam (1989)
Google Scholar
Melliar-Smith, P.M., Moser, L.E.: Trans: A reliable broadcast protocol. IEEE Trans. Commun. 140(6), 481–493 (1993)
Google Scholar
Melliar-Smith, P.M., Moser, L.E., Agrawala, V.: Membership algorithms for asynchronous distributed systems. In: Proceedings of the IEEE Eleventh ICDCS, May 1991, pp. 480–488 (1991)
Google Scholar
Mishra, S., Peterson, L.L., Schlichting, R.D.: A membership protocol based on partial order. In: Proceedings of the IEEE International Working Conference on Dependable Computing for Critical Applications, February 1991, pp. 137–145 (1991)
Google Scholar
Moser, L.E., Amir, Y., Melliar-Smith, P.M., Agarwal, D.A.: Extended virtual synchrony. In: Proceedings of the Fourteenth International Conference on Distributed Computing Systems, June 1994, pp. 56–65. IEEE Computer Society Press, New York (1994a). Also Technical Report TR-93-22, Department of ECE, University of California, Santa Barbara, December (1993)
Google Scholar
Moser, L.E., Melliar-Smith, P.M., Agarwal, U.: Processor membership in asynchronous distributed systems. IEEE Trans. Parallel Distrib. Syst. 5(5), 459–473 (1994b)
Article Google Scholar
Rabin, M.: Randomized Byzantine generals. In: Proceedings of the Twenty-Fourth Annual Symposium on Foundations of Computer Science, pp. 403–409. IEEE Computer Society Press, New York (1983)
Google Scholar
Reiter, M.K.: Secure agreement protocols: Reliable and atomic group multicast in rampart. In: Proceedings of the Second ACM Conference on Computer and Communications Security, Oakland, November 1994, pp. 68–80 (1994a)
Chapter Google Scholar
Reiter, M.K., A secure group membership protocol. In: Proceedings of the 1994 Symposium on Research in Security and Privacy, Oakland, May 1994, pp. 89–99. IEEE Computer Society Press, New York (1994b)
Google Scholar
Ricciardi, A.M.: The group membership problem in asynchronous systems. Ph.D. diss., Cornell University, January (1993)
Google Scholar
Ricciardi, A.: The impossibility of (repeated) reliable broadcast. Technical Report TR-PDS-1996-003, Department of Electrical and Computer Engineering, University of Texas, Austin, April (1996)
Google Scholar
Ricciardi, A., Birman, K.P.: Using process groups to implement failure detection in asynchronous environments. In: Proceedings of the Eleventh ACM Symposium on Principles of Distributed Computing, Quebec, August 1991, pp. 341–351. ACM Press, New York (1991)
Chapter Google Scholar
Ricciardi, A., Birman, K.P., Stephenson, P.: The cost of order in asynchronous systems. In: WDAG 1992. Lecture Notes in Computer Science, pp. 329–345. Springer, Berlin (1992)
Google Scholar
Rodrigues, L., Verissimo, P.: Causal separators for large-scale multicast communication. In: Proceedings of the Fifteenth International Conference on Distributed Computing Systems, May 1995, pp. 83–91 (1995)
Google Scholar
Rodrigues, L., Verissimo, P., Rufino, J.: A low-level processor group membership protocol for LANs. In: Proceedings of the Thirteenth International Conference on Distributed Computing Systems, May 1993, pp. 541–550 (1993)
Google Scholar
Rodrigues, L., Guo, K., Verissimo, P., Birman, K.P.: A dynamic light-weight group service. J. Parallel Distrib. Comput. 60, 1449–1479 (2000)
Article MATH Google Scholar
Sabel, L., Marzullo, K.: Simulating fail-stop in asynchronous distributed systems. In: Proceedings of the Thirteenth Symposium on Reliable Distributed Systems, Dana Point, CA, October 1994, pp. 138–147. IEEE Computer Society Press, New York (1994)
Google Scholar
Schiper, A., Eggli, J., Sandoz, A.: A new algorithm to implement causal ordering. In: Proceedings of the Third International Workshop on Distributed Algorithms. Lecture Notes in Computer Science, vol. 392, pp. 219–232. Springer, Berlin (1989)
Google Scholar
Schiper, A., Shvartsman, A.A., Weatherspoon, H., Zhao, B.: Future Directions in Distributed Computing, Research and Position Papers. Springer, Berlin (2003)
Book MATH Google Scholar
Schmuck, F.: The use of efficient broadcast primitives in asynchronous distributed systems. Ph.D. diss., Cornell University, August (1988). Also Technical Report, Department of Computer Science, Cornell University
Google Scholar
Schneider, F.B.: Byzantine generals in action: Implementing fail-stop processors. ACM Trans. Comput. Syst. 2(2), 145–154 (1984)
Article Google Scholar
Schneider, F.B.: Implementing fault-tolerant services using the StateMachine approach. ACM Comput. Surv. 22(4), 299–319 (1990)
Article Google Scholar
Skeen, D.: Crash recovery in a distributed database system. Ph.D. diss., Department of EECS, University of California, Berkeley, June (1982a)
Google Scholar
Skeen, D.: Determining the last process to fail. ACM Trans. Comput. Syst. 3(1), 15–30 (1985)
Article MathSciNet Google Scholar
Stephenson, P.: Fast causal multicast. Ph.D. diss., Cornell University, February (1991). Also Technical Report, Department of Computer Science, Cornell University
Google Scholar
Thomas, T.: A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Syst. 4(2), 180–209 (1979)
Article Google Scholar
van Renesse, R.: Causal controversy at Le Mont St.-Michel. Oper. Syst. Rev. 27(2), 44–53 (1993)
Article Google Scholar
Wood, M.D.: Fault-tolerant management of distributed applications using a reactive system architecture. Ph.D. diss., Cornell University, December (1991). Also Technical Report TR 91-1252, Department of Computer Science, Cornell University
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, USA
Kenneth P. Birman

Authors

Kenneth P. Birman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Birman, K.P. (2012). Group Communication Systems. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_12

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2416-0_12
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2415-3
Online ISBN: 978-1-4471-2416-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics