A Membership Protocol Based on Partial Order
Membership information is used to provide a consistent, system-wide view of which processes are currently functioning or failed in a distributed computation. This paper describes a membership protocol that is used to maintain this information. Our protocol is novel because it is based on a multicast facility that preserves only the partial order of messages exchanged among the communicating processes. Because it depends only on a partial ordering of messages rather than a total ordering, our protocol requires less synchronization overhead. The advantages of our approach are especially pronounced if multiple failures occur concurrently.
KeywordsPartial Order Logical Time Membership List Context Graph Stability Check
Unable to display preview. Download preview PDF.
- F. Cristian, “Probabilistic clock synchronization,” in Ninth International Symposium on DCS, (Newport Beach, CA), pp. 288-296, Jun 1989.Google Scholar
- J. Y. Halpern, B. Simons, R. Strong, and D. Dolev, “Fault-tolerant clock synchronization,” in Third ACM Symposium on PODC, (Vancouver, Canada), pp. 89-102, Aug 1984.Google Scholar
- K. Birman and K. Marzullo, “The role of order in distributed programs,” Tech. Rep. 89-1001, Department of Computer Science, Cornell University, 1989.Google Scholar
- H. Garcia-Molina and A. Spauster, “Message ordering in a multicast environment,” in Ninth International Conference on DCS, (Newport Beach, CA), pp. 354-361, Jun 1989.Google Scholar
- P. Kearns and B. Koodalattupuram, “Immediate ordered service in distributed systems,” in Ninth International Conference on DCS, (Newport Beach, CA), pp. 611-618, Jun 1989.Google Scholar
- F. Cristian, “Agreeing on who is present and who is absent in a synchronous distributed system,” in Eighteenth FTCS, (Tokyo), pp. 206-211, Jun 1988.Google Scholar
- H. Kopetz, G. Grunsteidl, and J. Reisinger, “Fault-tolerant membership service in a synchronous distributed real-time system,” in International Working Conference on Dependable Computing for Critical Applications, (Santa Barbara, California), pp. 167-174, Aug 1989.Google Scholar
- P. Verissimo and J. Marques, “Reliable broadcast for fault-tolerance on local computer networks,” in Ninth IEEE Symposium on Reliable Distributed Systems, pp. 54-63, Oct. 1990.Google Scholar
- S. Mishra, L. L. Peterson, and R. D. Schlichting, “Implementing fault-tolerant replicated objects using Psync,” in Eighth IEEE Symposium on Reliable Distributed Systems, pp. 42-52, Oct. 1989.Google Scholar
- N. C. Hutchinson, L. L. Peterson, M. Abbott, and S. O’Malley, “RPC in the x-Kernel: Evaluating new design techniques,” in Proceedings of the Twelfth ACM Symposium on Operating System Principles, pp. 91-101, Dec. 1989.Google Scholar