Abstract
We present a precise specification of the primary-backup approach. Then, for a variety of different failure models we prove lower bounds on the degree of replication, failover time, and worst-case blocking time for client requests. Finally, we outline primary-backup protocols and indicate which of our lower bounds are tight.
Supported by Defense Advanced Research Projects Agency (DoD) under NASA Ames grant number NAG 2-593 and by grants from IBM, Siemens, and Xerox. Budhiraja is also supported by an IBM Graduate Fellowship. The views, opinions, and findings contained in this report are those of the authors and should not be construed as an official Department of Defense position, policy, or decision.
Supported in part by the Office of Naval Research under contract N00014-91-J-1219, the National Science Foundation under Grant No. CCR-8701103, DARPA/NSF Grant No. CCR-9014363, and by a grant from IBM Endicott Programming Laboratory.
Supported in part by NSF grants CCR-8901780 and CCR-9102231 and by a grant from IBM Endicott Programming Laboratory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. A. Alsberg, J. D. Day. A principle for resilient sharing of distributed resources. Proc. Second International Conference on Software Engineering, October 1976, pp. 627-644.
Ö. Babaoğlu, R. Drummond. Streets of Byzantium: network architectures for fast reliable broadcasts. IEEE Transactions on Software Engineering, 11(6), June 1985, pp. 546–554.
J. F. Barlett. A nonstop kernel. Proc. Eighth ACM Symposium on Operating System Principles, SIGOPS Operating System, Review, vol. 15, December 1981, pp. 22–29.
A. Bhide, E. N. Elnozahy, S. P. Morgan. A highly available network file server. USENIX, 1991, pp. 199-205.
K. P. Birman, T. A. Joseph. Exploiting virtual synchrony in distributed systems. Eleventh ACM Symposium on Operating System Principles, November 1987, pp. 123-138.
N. Budhiraja, K. Marzullo, F. Schneider, S. Toueg. Optimal primary-backup protocols. Proc. Sixth International Workshop on Distributed Algorithms, Haifa, Israel, November 1992. To appear.
IBM International Technical Support Centers. IBM/VS extended recovery facility (XRF) technical reference. Technical Report GG24-3I53-0, IBM, 1987.
F. Cristian. Synchronous atomic broadcast for redundant broadcast channels. Journal of Real-Time Systems, 2, September 1990, pp. 195-212.
F. Cristian, H. Aghili, H. R. Strong, D. Dolev. Atomic broadcast: from simple message diffusion to Byzantine agreement. Proc. Fifteenth International Symposium on Fault-Tolerant Computing, Ann Arbor, Michigan, June 1985. A revised version appears as IBM Technical Report RJ5244, pp. 200-206.
V. Hadzilacos. Issues of fault tolerance in concurrent computations. PhD thesis, Harvard University, June 1984. Technical Report 11-84, Department of Computer Science.
T. Joseph, K. Birman. Reliable broadcast protocols. ACM Press, New York, 1989, pp. 294–318.
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), July 1978, pp. 558–565.
L. Lamport, M. Fischer. Byzantine generals and transaction commit protocols. Op. 62, SRI International, April 1982.
L. Lamport, P. M. Melliar-Smith. Synchronizing clocks in the presence of faults. Journal of the ACM, 32(1), January 1985, 52–78.
B. Liskov, S. Ghemawat, R. Gruber, P. Johnson, M. Williams. Replication in the Harp file system. Proc. 13th Symposium on Operating System Principles, 1991, pp. 226-238.
T. Mann, A. Hisgen, G. Swart. An algorithm for data replication. Technical Report 46, Digital Systems Research Center, 1989.
G. Neiger, S. Toueg. Automatically increasing the fault-tolerance of distributed systems. Proc. Seventh ACM Symposium on Principles of Distributed Computing, ACM SIGOPS-SIGACT, Toronto, Ontario, August 1988, pp. 248-262.
B. Oki, B. Liskov. Viewstamped replication: a new primary copy method to support highly available distributed systems. Seventh ACM Symposium on Principles of Distributed Computing, ACM SIGOPS-SIGACT, Toronto, Ontario, August 1988, pp. 8-17.
M. Pease, R. Shostak, L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2), April 1980, pp. 228–234.
K. J. Perry, S. Toueg. Distributed agreement in the presence of processor and communication faults. IEEE Transactions on Software Engineering, 12(3), March 1986, pp. 477–482.
R. D. Schlichting, F. B. Schneider. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems, 1(3), August 1983, pp. 222–238.
F. B. Schneider. Implementing fault tolerant services using the state machine approach: a tutorial. Computing Surveys, 22(4), December 1990, pp. 299–319.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag Wien
About this paper
Cite this paper
Budhiraja, N., Marzullo, K., Schneider, F.B., Toueg, S. (1993). Primary-Backup Protocols: Lower Bounds and Optimal Implementations. In: Landwehr, C.E., Randell, B., Simoncini, L. (eds) Dependable Computing for Critical Applications 3. Dependable Computing and Fault-Tolerant Systems, vol 8. Springer, Vienna. https://doi.org/10.1007/978-3-7091-4009-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-7091-4009-3_14
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-4011-6
Online ISBN: 978-3-7091-4009-3
eBook Packages: Springer Book Archive