Abstract
The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models—Byzantine and fail-stop—are discussed. The state machine approach is illustrated by programming several examples. Optimization and system reconfiguration techniques are explained.
This material is based on work supported in part by the Office of Naval Research under contract N00014-86-K-0092, the National Science Foundation under Grant No. CCR-8701103, and Digital Equipment Corporation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not reflect the views of these agencies.
Preview
Unable to display preview. Download preview PDF.
References
Babaoglu, O. On the reliability of consensus-based fault-tolerant distributed systems. ACM TOCS 5, 4 (Nov. 1987), 394–416.
Bernstein, A.J. A loosely coupled system for reliably storing data. IEEE Trans. on Software Engineering SE-11, 5 (May 1985), 446–454.
Birman, K.P. Replication and fault tolerance in the ISIS system. Proc. Tenth ACM Symposium on Operating Systems Principles. (Orcas Island, Washington, Dec. 1985), ACM, 79–86.
Birman, K.P. and T. Joseph. Reliable communication in the presence of failures. ACM TOCS 5, 1 (Feb. 1987), 47–76.
Borg, A., J. Baumbach, and S. Glazer. A message system supporting fault tolerance. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 90–99.
Cooper, E.C. Replicated procedure call. Proc. of the Third ACM Symposium on Principles of Distributed Computing, (Vancouver, Canada, August 1984), ACM, 220–232.
Cristian, F., H. Aghili, H.R. Strong, and D. Dolev. Atomic Broadcast: From simple message diffusion to Byzantine agreement. Proc. Fifteenth International Conference on Fault-tolerant Computing, (Ann Arbor, Mich., June 1985), IEEE Computer Society.
Dijkstra, E.W. Self Stabilization in Spite of Distributed Control. CACM 17, 11 (Nov. 1974), 643–644.
Fischer, M., N. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. JACM 32, 2 (April 1985), 374–382.
Garcia-Molina, H., F. Pittelli, and S. Davidson. Application of Byzantine agreement in database systems. TR 316, Department of Computer Science, Princeton University, June 1984.
Gray, J. Notes on Data Base Operating Systems. Operating Systems: An Advanced Course, Lecture Notes in Computer Science, Vol. 60, Springer-Verlag, New York, 1978, 393–481.
Hammer, M. and D. Shipman. Reliability mechanisms for SDD-1: A system for distributed databases. ACM TODS 5, 4 (December 1980), 431–466.
Lamport, L. Time, clocks and the ordering of events in a distributed system. CACM 21, 7 (July 1978), 558–565.
Lamport, L. The implementation of reliable distributed multiprocess systems. Computer Networks 2 (1978), 95–114.
Lamport, L. Using time instead of timeout for fault-tolerance in distributed systems. ACM TOPLAS 6, 2 (April 1984), 254–280.
Lamport, L., R. Shostak, and M. Pease. The Byzantine generals problem. ACM TOPLAS 4, 3 (July 1982), 382–401.
Liskov, B. The Argus language and system. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.
Liskov, B. and R. Ladin. Highly-available distributed services and fault-tolerant distributed garbage collection. Proc. of the Fifth ACM Symposium on Principles of Distributed Computing, (Calgry, Alberta, Canada, August 1986), ACM, 29–39.
Pittelli, F.M. and H. Garcia-Molina. Efficient scheduling in a TMR database system. Proc. Seventeenth International Symposium on Fault-tolerant Computing, (Pittsburgh, Pa, July 1987), IEEE.
Powell, M. and D. Presotto. PUBLISHING: A reliable broadcast communication mechanism. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 100–109.
Schlichting, R.D. and F.B. Schneider. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM TOCS 1, 3 (August 1983), 222–238.
Schneider, F.B. Ensuring Consistency on a Distributed Database System by Use of Distributed Semaphores. Proc. International Symposium on Distributed Data Bases (Paris, France, March 1980), INRIA, 183–189.
Schneider, F.B. Synchronization in distributed programs. ACM TOPLAS 4, 2 (April 1982), 179–195.
Schneider, F.B. Byzantine generals in action: Implementing fail-stop processors. ACM TOCS 2, 2 (May 1984), 145–154.
Schneider, F.B. Paradigms for distributed programs. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.
Schneider, F.B. A paradigm for reliable clock synchronization. Proc. Advanced Seminar on Real-Time Local Area Networks (Bandol, France, April 1986), INRIA, 85–104.
Schneider, F.B., D. Gries, and R.D. Schlichting. Fault-Tolerant Broadcasts. Science of Computer Programming 4 (1984), 1–15.
Siewiorek, D.P. and R.S. Swarz. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass, 1982.
Skeen, D. Crash Recovery in a Distributed Database System. Ph.D. Thesis, University of California at Berkeley, May 1982.
Spector, A.Z. Distributed transactions for reliable systems. Proc. Tenth ACM Symposium on Operating Systems Principles, (Orcas Island, Washington, Dec. 1985), ACM, 127–146.
Strong, H.R. and D. Dolev. Byzantine agreement. Intellectual Leverage for the Information Society, Digest of Papers, (Compcon 83, IEEE Computer Society, March 1983), IEEE Computer Society, 77–82.
Wensley, J., et al. SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. Proc. IEEE 66, 10 (Oct. 1978), 1240–1255.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schneider, F.B. (1990). The state machine approach: A tutorial. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042323
Download citation
DOI: https://doi.org/10.1007/BFb0042323
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97385-2
Online ISBN: 978-0-387-34812-4
eBook Packages: Springer Book Archive