Skip to main content

Part of the book series: Texts in Computer Science ((TCS))

Abstract

Our goal in Chap. 12 is to identify the best options for implementing high-speed data replication and other tools needed for fault-tolerant, highly assured Web Services and other forms of distributed computing. Given the GMS created in Chap. 11, one option would be to plunge right in and build replicated applications using the protocol directly in the application. The approach builds on the GMS, but then uses it to create protocols that can only be operated under the assumption that if a failure occurs, the GMS will be notified and will reconfigure the system appropriately, notifying the new system configuration members of their new state, and taking steps to shut down any old members that are unreachable but later recover. We arrive at a rich collection of protocols and establish a subtle linkage to the Paxos framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Most ordered of all is the flush protocol used to install new views: This delivers a type of message (the new view) in a way that is ordered with respect to all other types of message. In the Isis Toolkit, there was actually a SafeSend primitive, which could be used to obtain this behavior at the request of the user, but it was rarely used and more recent systems tend to use this protocol only to install new process group views.

References

  • Agarwal, D.A.: Totem: A reliable ordered delivery protocol for interconnected local area networks. Ph.D. diss., Department of Electrical and Computer Engineering, University of California, Santa Barbara (1994)

    Google Scholar 

  • Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Membership algorithms in broadcast domains. In: Proceedings of the Sixth WDAG, Israel, June 1992. Lecture Notes in Computer Science, vol. 647, pp. 292–312. Springer, Berlin (1992a)

    Google Scholar 

  • Anceaume, E., Charron-Bost, B., Minet, P., Toueg, S.: On the formal specification of group membership services. Technical Report 95-1534, Department of Computer Science, Cornell University, August (1995)

    Google Scholar 

  • Babaoglu, O., Marzullo, K.: Consistent global states of distributed systems: Fundamental concepts and mechanisms. In: Mullender, S.J. (ed.) Distributed Systems, 2nd edn. Addison-Wesley/ACM Press, Reading (1993)

    Google Scholar 

  • Babaoglu, O., Davoli, R., Giachini, L.A., Baker, M.B.: RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project Deliverable Report, Department of Computing Science, University of Newcastle upon Tyne, United Kingdom (1994)

    Google Scholar 

  • Babaoglu, O., Davoli, R., Montresor, A.: Failure detectors, group membership, and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-19, Department of Computer Science, University of Bologna, November (1995)

    Google Scholar 

  • Ben-Or, M.: Fast asynchronous byzantine agreement. In: Proceedings of the Fourth ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985, pp. 149–151 (1985)

    Chapter  Google Scholar 

  • Bernstein, P.E., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)

    Google Scholar 

  • Birman, K.P.: A response to Cheriton and Skeen’s criticism of causal and totally ordered communication. Oper. Syst. Rev. 28(1), 11–21 (1994)

    Article  Google Scholar 

  • Birman, K.P., Glade, B.B.: Consistent failure reporting in reliable communications systems. IEEE Softw., Special Issue on Reliability (1995)

    Google Scholar 

  • Birman, K.P., Joseph, T.A.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, November 1987, pp. 123–138. ACM Press, New York (1987a)

    Chapter  Google Scholar 

  • Birman, K.P., Joseph, T.A.: Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5(1), 47–76 (1987b)

    Article  Google Scholar 

  • Birman, K.P., Schiper, A., Stephenson, P.: Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst. 9(3), 272–314 (1991)

    Article  Google Scholar 

  • Birman, K., Cantwell, J., Freedman, D., Huang, Q., Nikolov, P., Ostrowski, K.: Edge mashups for service-oriented collaboration. IEEE Comput. 42(5) (2010)

    Google Scholar 

  • Chandra, T., Toueg, S.: Unreliable failure detectors for asynchronous systems. J. ACM (in press). Previous version in ACM Symposium on Principles of Distributed Computing (Montreal, 1991), pp. 325–340

    Google Scholar 

  • Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: ACM Symposium on Principles of Distributed Computing, August 1992, pp. 147–158 (1992)

    Google Scholar 

  • Chandra, T., Hadzilacos, V., Toueg, S., Charron-Bost, B.: On the impossibility of group membership. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, Vancouver, May 1996

    Google Scholar 

  • Chandy, K.M., Lamport, L.: Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985)

    Article  Google Scholar 

  • Chang, M., Maxemchuk, N.: Reliable broadcast protocols. ACM Trans. Comput. Syst. 2(3), 251–273 (1984)

    Article  Google Scholar 

  • Charron-Bost, B.: Concerning the size of logical clocks in distributed systems. Inf. Process. Lett. 39(1), 11–16 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Cheriton, D., Skeen, D.: Understanding the limitations of causally and totally ordered communication. In: Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, Asheville, NC, December 1993, pp. 44–57. ACM Press, New York (1993)

    Chapter  Google Scholar 

  • Coan, B., Thomas, G.: Agreeing on a leader in real time. In: Proceedings of the Eleventh Real-Time Systems Symposium, December 1990, pp. 166–172 (1990)

    Chapter  Google Scholar 

  • Coan, B., Oki, B.M., Kolodner, E.K.: Limitations on database availability when networks partition. In: Proceedings of the Fifth ACM Symposium on Principles of Distributed Computing, Calgary, August 1986, pp. 187–194 (1986)

    Chapter  Google Scholar 

  • Cooper, E.: Replicated distributed programs. In: Proceedings of the Tenth ACM Symposium on Operating Systems Principles, Orcas Island, WA, December 1985, pp. 63–78. ACM Press, New York (1985)

    Chapter  Google Scholar 

  • Cooper, R.: Experience with causally and totally ordered group communication support—A cautionary tale. Oper. Syst. Rev. 28(1), 28–32 (1994)

    Article  Google Scholar 

  • Cristian, F.: Reaching agreement on processor group membership in synchronous distributed systems. Distrib. Comput. 4(4), 175–187 (1991a)

    Article  MATH  Google Scholar 

  • Cristian, F., Schmuck, F.: Agreeing on process group membership in asynchronous distributed systems. Technical Report CSE95-428, Department of Computer Science and Engineering, University of California, San Diego (1995)

    Google Scholar 

  • Cristian, F., Aghili, H., Strong, R., Dolev, D.: Atomic broadcast: From simple message diffusion to byzantine agreement. In: Proceedings of the Fifteenth International Symposium on Fault-Tolerant Computing, pp. 200–206. IEEE Computer Society Press, New York (1985). Revised as IBM Technical Report RJ5244

    Google Scholar 

  • Cristian, F., Dolev, D., Strong, R., Aghili, H.: Atomic broadcast in a real-time environment. In: Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol. 448, pp. 51–71. Springer, Berlin (1990)

    Chapter  Google Scholar 

  • Fidge, C.: Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the Eleventh Australian Computer Science Conference (1988)

    Google Scholar 

  • Fisher, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: Proceedings of the Fourth Annual ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985. ACM Press, New York (1985a)

    Google Scholar 

  • Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed computing with one faulty process. J. ACM 32(2), 374–382 (1985b)

    Article  Google Scholar 

  • Friedman, R., Keider, I., Malkhi, D., Birman, K.P., Dolev, D.: Deciding in partitionable networks. Technical Report 95-1554, Department of Computer Science, Cornell University, October (1995)

    Google Scholar 

  • Gifford, D.: Weighted voting for replicated data. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles, Pacific Grove, CA, December 1979, pp. 150–162. ACM Press, New York (1979)

    Google Scholar 

  • Golding, R.A.: Weak consistency group communication and membership. Ph.D. diss., Computer and Information Sciences Department, University of California, Santa Cruz (1992)

    Google Scholar 

  • Gray, J.: Notes on database operating systems. In: Operating Systems: An Advanced Course. Lecture Notes in Computer Science, vol. 60, pp. 393–481. Springer, Berlin (1978)

    Google Scholar 

  • Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo (1993)

    MATH  Google Scholar 

  • Guerraoui, R., Schiper, A.: Gamma-accurate failure detectors. Technical Report APFL, Lausanne, Switzerland: Départment d’Informatique (1996)

    Google Scholar 

  • Kaashoek, F.: Group communication in distributed computer systems. Ph.D. diss., Vrije Universiteit (1992)

    Google Scholar 

  • Ladin, R., Liskov, B., Shrira, L., Ghemawat, S.: Providing availability using lazy replication. ACM Trans. Comput. Syst. 10(4), 360–391 (1992)

    Article  Google Scholar 

  • Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978a)

    Article  MATH  Google Scholar 

  • Lamport, L.: Using time instead of timeout for fault-tolerant distributed systems. ACM Trans. Program. Lang. Syst. 6(2), 254–280 (1984)

    Article  Google Scholar 

  • Liskov, B., et al.: Replication in the Harp file system. In: Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, Asilomar, CA, October 1991, pp. 226–238. ACM Press, New York (1991)

    Chapter  Google Scholar 

  • Mattern, F.: Time and global states in distributed systems. In: Proceedings of the International Workshop on Parallel and Distributed Algorithms. North-Holland, Amsterdam (1989)

    Google Scholar 

  • Melliar-Smith, P.M., Moser, L.E.: Trans: A reliable broadcast protocol. IEEE Trans. Commun. 140(6), 481–493 (1993)

    Google Scholar 

  • Melliar-Smith, P.M., Moser, L.E., Agrawala, V.: Membership algorithms for asynchronous distributed systems. In: Proceedings of the IEEE Eleventh ICDCS, May 1991, pp. 480–488 (1991)

    Google Scholar 

  • Mishra, S., Peterson, L.L., Schlichting, R.D.: A membership protocol based on partial order. In: Proceedings of the IEEE International Working Conference on Dependable Computing for Critical Applications, February 1991, pp. 137–145 (1991)

    Google Scholar 

  • Moser, L.E., Amir, Y., Melliar-Smith, P.M., Agarwal, D.A.: Extended virtual synchrony. In: Proceedings of the Fourteenth International Conference on Distributed Computing Systems, June 1994, pp. 56–65. IEEE Computer Society Press, New York (1994a). Also Technical Report TR-93-22, Department of ECE, University of California, Santa Barbara, December (1993)

    Google Scholar 

  • Moser, L.E., Melliar-Smith, P.M., Agarwal, U.: Processor membership in asynchronous distributed systems. IEEE Trans. Parallel Distrib. Syst. 5(5), 459–473 (1994b)

    Article  Google Scholar 

  • Rabin, M.: Randomized Byzantine generals. In: Proceedings of the Twenty-Fourth Annual Symposium on Foundations of Computer Science, pp. 403–409. IEEE Computer Society Press, New York (1983)

    Google Scholar 

  • Reiter, M.K.: Secure agreement protocols: Reliable and atomic group multicast in rampart. In: Proceedings of the Second ACM Conference on Computer and Communications Security, Oakland, November 1994, pp. 68–80 (1994a)

    Chapter  Google Scholar 

  • Reiter, M.K., A secure group membership protocol. In: Proceedings of the 1994 Symposium on Research in Security and Privacy, Oakland, May 1994, pp. 89–99. IEEE Computer Society Press, New York (1994b)

    Google Scholar 

  • Ricciardi, A.M.: The group membership problem in asynchronous systems. Ph.D. diss., Cornell University, January (1993)

    Google Scholar 

  • Ricciardi, A.: The impossibility of (repeated) reliable broadcast. Technical Report TR-PDS-1996-003, Department of Electrical and Computer Engineering, University of Texas, Austin, April (1996)

    Google Scholar 

  • Ricciardi, A., Birman, K.P.: Using process groups to implement failure detection in asynchronous environments. In: Proceedings of the Eleventh ACM Symposium on Principles of Distributed Computing, Quebec, August 1991, pp. 341–351. ACM Press, New York (1991)

    Chapter  Google Scholar 

  • Ricciardi, A., Birman, K.P., Stephenson, P.: The cost of order in asynchronous systems. In: WDAG 1992. Lecture Notes in Computer Science, pp. 329–345. Springer, Berlin (1992)

    Google Scholar 

  • Rodrigues, L., Verissimo, P.: Causal separators for large-scale multicast communication. In: Proceedings of the Fifteenth International Conference on Distributed Computing Systems, May 1995, pp. 83–91 (1995)

    Google Scholar 

  • Rodrigues, L., Verissimo, P., Rufino, J.: A low-level processor group membership protocol for LANs. In: Proceedings of the Thirteenth International Conference on Distributed Computing Systems, May 1993, pp. 541–550 (1993)

    Google Scholar 

  • Rodrigues, L., Guo, K., Verissimo, P., Birman, K.P.: A dynamic light-weight group service. J. Parallel Distrib. Comput. 60, 1449–1479 (2000)

    Article  MATH  Google Scholar 

  • Sabel, L., Marzullo, K.: Simulating fail-stop in asynchronous distributed systems. In: Proceedings of the Thirteenth Symposium on Reliable Distributed Systems, Dana Point, CA, October 1994, pp. 138–147. IEEE Computer Society Press, New York (1994)

    Google Scholar 

  • Schiper, A., Eggli, J., Sandoz, A.: A new algorithm to implement causal ordering. In: Proceedings of the Third International Workshop on Distributed Algorithms. Lecture Notes in Computer Science, vol. 392, pp. 219–232. Springer, Berlin (1989)

    Google Scholar 

  • Schiper, A., Shvartsman, A.A., Weatherspoon, H., Zhao, B.: Future Directions in Distributed Computing, Research and Position Papers. Springer, Berlin (2003)

    Book  MATH  Google Scholar 

  • Schmuck, F.: The use of efficient broadcast primitives in asynchronous distributed systems. Ph.D. diss., Cornell University, August (1988). Also Technical Report, Department of Computer Science, Cornell University

    Google Scholar 

  • Schneider, F.B.: Byzantine generals in action: Implementing fail-stop processors. ACM Trans. Comput. Syst. 2(2), 145–154 (1984)

    Article  Google Scholar 

  • Schneider, F.B.: Implementing fault-tolerant services using the StateMachine approach. ACM Comput. Surv. 22(4), 299–319 (1990)

    Article  Google Scholar 

  • Skeen, D.: Crash recovery in a distributed database system. Ph.D. diss., Department of EECS, University of California, Berkeley, June (1982a)

    Google Scholar 

  • Skeen, D.: Determining the last process to fail. ACM Trans. Comput. Syst. 3(1), 15–30 (1985)

    Article  MathSciNet  Google Scholar 

  • Stephenson, P.: Fast causal multicast. Ph.D. diss., Cornell University, February (1991). Also Technical Report, Department of Computer Science, Cornell University

    Google Scholar 

  • Thomas, T.: A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Syst. 4(2), 180–209 (1979)

    Article  Google Scholar 

  • van Renesse, R.: Causal controversy at Le Mont St.-Michel. Oper. Syst. Rev. 27(2), 44–53 (1993)

    Article  Google Scholar 

  • Wood, M.D.: Fault-tolerant management of distributed applications using a reactive system architecture. Ph.D. diss., Cornell University, December (1991). Also Technical Report TR 91-1252, Department of Computer Science, Cornell University

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London Limited

About this chapter

Cite this chapter

Birman, K.P. (2012). Group Communication Systems. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2416-0_12

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2415-3

  • Online ISBN: 978-1-4471-2416-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics