Skip to main content

The state machine approach: A tutorial

  • Conference paper
  • First Online:
Fault-Tolerant Distributed Computing

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 448))

Abstract

The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models—Byzantine and fail-stop—are discussed. The state machine approach is illustrated by programming several examples. Optimization and system reconfiguration techniques are explained.

This material is based on work supported in part by the Office of Naval Research under contract N00014-86-K-0092, the National Science Foundation under Grant No. CCR-8701103, and Digital Equipment Corporation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not reflect the views of these agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babaoglu, O. On the reliability of consensus-based fault-tolerant distributed systems. ACM TOCS 5, 4 (Nov. 1987), 394–416.

    Article  Google Scholar 

  2. Bernstein, A.J. A loosely coupled system for reliably storing data. IEEE Trans. on Software Engineering SE-11, 5 (May 1985), 446–454.

    Google Scholar 

  3. Birman, K.P. Replication and fault tolerance in the ISIS system. Proc. Tenth ACM Symposium on Operating Systems Principles. (Orcas Island, Washington, Dec. 1985), ACM, 79–86.

    Google Scholar 

  4. Birman, K.P. and T. Joseph. Reliable communication in the presence of failures. ACM TOCS 5, 1 (Feb. 1987), 47–76.

    Article  Google Scholar 

  5. Borg, A., J. Baumbach, and S. Glazer. A message system supporting fault tolerance. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 90–99.

    Google Scholar 

  6. Cooper, E.C. Replicated procedure call. Proc. of the Third ACM Symposium on Principles of Distributed Computing, (Vancouver, Canada, August 1984), ACM, 220–232.

    Google Scholar 

  7. Cristian, F., H. Aghili, H.R. Strong, and D. Dolev. Atomic Broadcast: From simple message diffusion to Byzantine agreement. Proc. Fifteenth International Conference on Fault-tolerant Computing, (Ann Arbor, Mich., June 1985), IEEE Computer Society.

    Google Scholar 

  8. Dijkstra, E.W. Self Stabilization in Spite of Distributed Control. CACM 17, 11 (Nov. 1974), 643–644.

    MATH  Google Scholar 

  9. Fischer, M., N. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. JACM 32, 2 (April 1985), 374–382.

    Article  MATH  MathSciNet  Google Scholar 

  10. Garcia-Molina, H., F. Pittelli, and S. Davidson. Application of Byzantine agreement in database systems. TR 316, Department of Computer Science, Princeton University, June 1984.

    Google Scholar 

  11. Gray, J. Notes on Data Base Operating Systems. Operating Systems: An Advanced Course, Lecture Notes in Computer Science, Vol. 60, Springer-Verlag, New York, 1978, 393–481.

    Google Scholar 

  12. Hammer, M. and D. Shipman. Reliability mechanisms for SDD-1: A system for distributed databases. ACM TODS 5, 4 (December 1980), 431–466.

    Article  Google Scholar 

  13. Lamport, L. Time, clocks and the ordering of events in a distributed system. CACM 21, 7 (July 1978), 558–565.

    MATH  Google Scholar 

  14. Lamport, L. The implementation of reliable distributed multiprocess systems. Computer Networks 2 (1978), 95–114.

    Article  MathSciNet  Google Scholar 

  15. Lamport, L. Using time instead of timeout for fault-tolerance in distributed systems. ACM TOPLAS 6, 2 (April 1984), 254–280.

    Article  Google Scholar 

  16. Lamport, L., R. Shostak, and M. Pease. The Byzantine generals problem. ACM TOPLAS 4, 3 (July 1982), 382–401.

    Article  MATH  Google Scholar 

  17. Liskov, B. The Argus language and system. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.

    Google Scholar 

  18. Liskov, B. and R. Ladin. Highly-available distributed services and fault-tolerant distributed garbage collection. Proc. of the Fifth ACM Symposium on Principles of Distributed Computing, (Calgry, Alberta, Canada, August 1986), ACM, 29–39.

    Google Scholar 

  19. Pittelli, F.M. and H. Garcia-Molina. Efficient scheduling in a TMR database system. Proc. Seventeenth International Symposium on Fault-tolerant Computing, (Pittsburgh, Pa, July 1987), IEEE.

    Google Scholar 

  20. Powell, M. and D. Presotto. PUBLISHING: A reliable broadcast communication mechanism. Proc. of Ninth ACM Symposium on Operating Systems Principles, (Bretton Woods, New Hampshire, October 1983), ACM, 100–109.

    Google Scholar 

  21. Schlichting, R.D. and F.B. Schneider. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM TOCS 1, 3 (August 1983), 222–238.

    Article  Google Scholar 

  22. Schneider, F.B. Ensuring Consistency on a Distributed Database System by Use of Distributed Semaphores. Proc. International Symposium on Distributed Data Bases (Paris, France, March 1980), INRIA, 183–189.

    Google Scholar 

  23. Schneider, F.B. Synchronization in distributed programs. ACM TOPLAS 4, 2 (April 1982), 179–195.

    Article  Google Scholar 

  24. Schneider, F.B. Byzantine generals in action: Implementing fail-stop processors. ACM TOCS 2, 2 (May 1984), 145–154.

    Article  Google Scholar 

  25. Schneider, F.B. Paradigms for distributed programs. Distributed Systems—Methods and Tools for Specification, Lecture Notes in Computer Science, Vol. 190, Springer-Verlag, New York, N.Y. 1985, 343–430.

    Google Scholar 

  26. Schneider, F.B. A paradigm for reliable clock synchronization. Proc. Advanced Seminar on Real-Time Local Area Networks (Bandol, France, April 1986), INRIA, 85–104.

    Google Scholar 

  27. Schneider, F.B., D. Gries, and R.D. Schlichting. Fault-Tolerant Broadcasts. Science of Computer Programming 4 (1984), 1–15.

    Article  MATH  MathSciNet  Google Scholar 

  28. Siewiorek, D.P. and R.S. Swarz. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass, 1982.

    Google Scholar 

  29. Skeen, D. Crash Recovery in a Distributed Database System. Ph.D. Thesis, University of California at Berkeley, May 1982.

    Google Scholar 

  30. Spector, A.Z. Distributed transactions for reliable systems. Proc. Tenth ACM Symposium on Operating Systems Principles, (Orcas Island, Washington, Dec. 1985), ACM, 127–146.

    Google Scholar 

  31. Strong, H.R. and D. Dolev. Byzantine agreement. Intellectual Leverage for the Information Society, Digest of Papers, (Compcon 83, IEEE Computer Society, March 1983), IEEE Computer Society, 77–82.

    Google Scholar 

  32. Wensley, J., et al. SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. Proc. IEEE 66, 10 (Oct. 1978), 1240–1255.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Barbara Simons Alfred Spector

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schneider, F.B. (1990). The state machine approach: A tutorial. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042323

Download citation

  • DOI: https://doi.org/10.1007/BFb0042323

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-97385-2

  • Online ISBN: 978-0-387-34812-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics