Reliable broadcast in synchronous and asynchronous environments (preliminary version)
This paper studies the problem of reliable broadcast of a sequence of values in a system subject to processor failures. We consider three failure models — crash, in which a processor may stop executing at any time, send omission, in which processors may intermittently fail to send messages and general omission, in which processors may intermittently fail to send and receive messages — in both synchronous (the “round model”) and asynchronous systems. In contrast to the Byzantine Generals formulation of reliable broadcast, the problem we consider can be solved for asynchronous systems. In synchronous systems, we first present an algorithm tolerant of crash failures, and use translation techniques to derive algorithms tolerant of send omission failures and general omission failures. For asynchronous systems, we present simple algorithms tolerant of all three failure models.
KeywordsAsynchronous System Synchronous System Input Tape Crash Failure Faulty Processor
Unable to display preview. Download preview PDF.
- [BNDDS87]Amotz Bar-Noy, Danny Dolev, Cynthia Dwork, and H. Raymond Strong. Shifting gears: Changing algorithms on the fly to expedite Byzantine agreement. In Proceedings of the Sixth ACM Symposium on Principles of Distributed Computing, pages 42–51, Vancouver, British Columbia, August 1987. ACM SIGOPS-SIGACT.Google Scholar
- [BT85]Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824–840, October 1985.Google Scholar
- [Coa87]Brian A. Coan. Achieving Consensus in Fault-Tolerant Distributed Computer Systems: Protocols, Lower Bounds, and Simulations. PhD thesis, Massachusetts Institute of Technology, June 1987.Google Scholar
- [Fis83]Michael J. Fischer. The consensus problem in unreliable distributed systems (a brief survey). Technical Report DCS/RR-273, Department of Computer Science, Yale University, June 1983.Google Scholar
- [Had84]Vassos Hadzilacos. Issues of Fault Tolerance in Concurrent Computations. PhD thesis, Harvard University, June 1984. Department of Computer Science Technical Report 11–84.Google Scholar
- [NT88]Gil Neiger and Sam Toueg. Automatically increasing the fault-tolerance of distributed systems. In Proceedings of the Seventh ACM Symposium on Principles of Distributed Computing, pages 248–262, Toronto, Ontario, August 1988. ACM SIGOPS-SIGACT.Google Scholar
- [PT86]Kenneth J. Perry and Sam Toueg. Distributed agreement in the presence of processor and communication faults. IEEE Transactions on Software Engineering, 12(3):477–482, March 1986.Google Scholar