The state machine approach: A tutorial

Schneider, Fred B.

doi:10.1007/BFb0042323

Fred B. Schneider¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 448))

380 Accesses
9 Citations

Abstract

The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models—Byzantine and fail-stop—are discussed. The state machine approach is illustrated by programming several examples. Optimization and system reconfiguration techniques are explained.

This material is based on work supported in part by the Office of Naval Research under contract N00014-86-K-0092, the National Science Foundation under Grant No. CCR-8701103, and Digital Equipment Corporation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not reflect the views of these agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Babaoglu, O. On the reliability of consensus-based fault-tolerant distributed systems. ACM TOCS 5, 4 (Nov. 1987), 394–416.
Article Google Scholar
Bernstein, A.J. A loosely coupled system for reliably storing data. IEEE Trans. on Software Engineering SE-11, 5 (May 1985), 446–454.
Google Scholar
Birman, K.P. Replication and fault tolerance in the ISIS system. Proc. Tenth ACM Symposium on Operating Systems Principles. (Orcas Island, Washington, Dec. 1985), ACM, 79–86.
Google Scholar
Birman, K.P. and T. Joseph. Reliable communication in the presence of failures. ACM TOCS 5, 1 (Feb. 1987), 47–76.
Article Google Scholar
Borg, A., J. Baumbach, and S. Glazer. A message system supporting fault tolerance. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 90–99.
Google Scholar
Cooper, E.C. Replicated procedure call. Proc. of the Third ACM Symposium on Principles of Distributed Computing, (Vancouver, Canada, August 1984), ACM, 220–232.
Google Scholar
Cristian, F., H. Aghili, H.R. Strong, and D. Dolev. Atomic Broadcast: From simple message diffusion to Byzantine agreement. Proc. Fifteenth International Conference on Fault-tolerant Computing, (Ann Arbor, Mich., June 1985), IEEE Computer Society.
Google Scholar
Dijkstra, E.W. Self Stabilization in Spite of Distributed Control. CACM 17, 11 (Nov. 1974), 643–644.
MATH Google Scholar
Fischer, M., N. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. JACM 32, 2 (April 1985), 374–382.
Article MATH MathSciNet Google Scholar
Garcia-Molina, H., F. Pittelli, and S. Davidson. Application of Byzantine agreement in database systems. TR 316, Department of Computer Science, Princeton University, June 1984.
Google Scholar
Gray, J. Notes on Data Base Operating Systems. Operating Systems: An Advanced Course, Lecture Notes in Computer Science, Vol. 60, Springer-Verlag, New York, 1978, 393–481.
Google Scholar
Hammer, M. and D. Shipman. Reliability mechanisms for SDD-1: A system for distributed databases. ACM TODS 5, 4 (December 1980), 431–466.
Article Google Scholar
Lamport, L. Time, clocks and the ordering of events in a distributed system. CACM 21, 7 (July 1978), 558–565.
MATH Google Scholar
Lamport, L. The implementation of reliable distributed multiprocess systems. Computer Networks 2 (1978), 95–114.
Article MathSciNet Google Scholar
Lamport, L. Using time instead of timeout for fault-tolerance in distributed systems. ACM TOPLAS 6, 2 (April 1984), 254–280.
Article Google Scholar
Lamport, L., R. Shostak, and M. Pease. The Byzantine generals problem. ACM TOPLAS 4, 3 (July 1982), 382–401.
Article MATH Google Scholar
Liskov, B. The Argus language and system. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.
Google Scholar
Liskov, B. and R. Ladin. Highly-available distributed services and fault-tolerant distributed garbage collection. Proc. of the Fifth ACM Symposium on Principles of Distributed Computing, (Calgry, Alberta, Canada, August 1986), ACM, 29–39.
Google Scholar
Pittelli, F.M. and H. Garcia-Molina. Efficient scheduling in a TMR database system. Proc. Seventeenth International Symposium on Fault-tolerant Computing, (Pittsburgh, Pa, July 1987), IEEE.
Google Scholar
Powell, M. and D. Presotto. PUBLISHING: A reliable broadcast communication mechanism. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 100–109.
Google Scholar
Schlichting, R.D. and F.B. Schneider. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM TOCS 1, 3 (August 1983), 222–238.
Article Google Scholar
Schneider, F.B. Ensuring Consistency on a Distributed Database System by Use of Distributed Semaphores. Proc. International Symposium on Distributed Data Bases (Paris, France, March 1980), INRIA, 183–189.
Google Scholar
Schneider, F.B. Synchronization in distributed programs. ACM TOPLAS 4, 2 (April 1982), 179–195.
Article Google Scholar
Schneider, F.B. Byzantine generals in action: Implementing fail-stop processors. ACM TOCS 2, 2 (May 1984), 145–154.
Article Google Scholar
Schneider, F.B. Paradigms for distributed programs. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.
Google Scholar
Schneider, F.B. A paradigm for reliable clock synchronization. Proc. Advanced Seminar on Real-Time Local Area Networks (Bandol, France, April 1986), INRIA, 85–104.
Google Scholar
Schneider, F.B., D. Gries, and R.D. Schlichting. Fault-Tolerant Broadcasts. Science of Computer Programming 4 (1984), 1–15.
Article MATH MathSciNet Google Scholar
Siewiorek, D.P. and R.S. Swarz. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass, 1982.
Google Scholar
Skeen, D. Crash Recovery in a Distributed Database System. Ph.D. Thesis, University of California at Berkeley, May 1982.
Google Scholar
Spector, A.Z. Distributed transactions for reliable systems. Proc. Tenth ACM Symposium on Operating Systems Principles, (Orcas Island, Washington, Dec. 1985), ACM, 127–146.
Google Scholar
Strong, H.R. and D. Dolev. Byzantine agreement. Intellectual Leverage for the Information Society, Digest of Papers, (Compcon 83, IEEE Computer Society, March 1983), IEEE Computer Society, 77–82.
Google Scholar
Wensley, J., et al. SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. Proc. IEEE 66, 10 (Oct. 1978), 1240–1255.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, 14853, Ithaca, New York
Fred B. Schneider

Authors

Fred B. Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Barbara Simons Alfred Spector

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schneider, F.B. (1990). The state machine approach: A tutorial. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042323

Download citation

DOI: https://doi.org/10.1007/BFb0042323
Published: 08 June 2005
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97385-2
Online ISBN: 978-0-387-34812-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics