Skip to main content

Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7976))

Abstract

Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-of-the-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems.

In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also for larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos), efficient methods to handle them bring us closer to the above mentioned goal.

As a case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.

Supported by the Austrian National Research Network S11403 and S11405 (RiSE) of the Austrian Science Fund (FWF) and by the Vienna Science and Technology Fund (WWTF) through grants PROSEED, ICT12-059, and VRG11-005.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ademaj, A.: Slightly-off-specification failures in the time-triggered architecture. In: High-Level Design Validation and Test Workshop, pp. 7–12. IEEE (2002)

    Google Scholar 

  2. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Consensus with Byzantine failures and little system synchrony. In: DSN, pp. 147–155 (2006)

    Google Scholar 

  3. Attiya, H., Welch, J.: Distributed Computing, 2nd edn. John Wiley & Sons (2004)

    Google Scholar 

  4. Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theoretical Computer Science 412(40), 5602–5630 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bokor, P., Kinder, J., Serafini, M., Suri, N.: Efficient model checking of fault-tolerant distributed protocols. In: DSN, pp. 73–84 (2011)

    Google Scholar 

  6. Bracha, G., Toueg, S.: Asynchronous consensus and broadcast protocols. J. ACM 32(4), 824–840 (1985)

    Article  MathSciNet  Google Scholar 

  7. Browne, M.C., Clarke, E.M., Grumberg, O.: Reasoning about networks with many identical finite state processes. Inf. Comput. 81, 13–31 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bucchiarone, A., Muccini, H., Pelliccione, P.: Architecting fault-tolerant component-based systems: from requirements to testing. Electr. Notes Theor. Comput. Sci. 168, 77–90 (2007)

    Article  Google Scholar 

  9. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  10. Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with benign faults. Distributed Computing 22(1), 49–71 (2009)

    Article  Google Scholar 

  11. Clarke, E., Talupur, M., Veith, H.: Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 33–47. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Clarke, E., Talupur, M., Touili, T., Veith, H.: Verification by network decomposition. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 276–291. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  14. Emerson, E., Namjoshi, K.: Reasoning about rings. In: POPL, pp. 85–94 (1995)

    Google Scholar 

  15. Feather, M.S., Fickas, S., Razermera-Mamy, N.A.: Model-checking for validation of a fault protection system. In: HASE, pp. 32–41 (2001)

    Google Scholar 

  16. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fisman, D., Kupferman, O., Lustig, Y.: On verifying fault tolerance of distributed protocols. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 315–331. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  18. Függer, M., Schmid, U.: Reconciling fault-tolerant distributed computing and systems-on-chip. Distributed Computing 24(6), 323–355 (2012)

    Article  MATH  Google Scholar 

  19. Gnesi, S., Latella, D., Lenzini, G., Abbaneo, C., Amendola, A., Marmo, P.: A formal specification and validation of a critical system in presence of Byzantine errors. In: Graf, S. (ed.) TACAS 2000. LNCS, vol. 1785, pp. 535–549. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  20. Holzmann, G.: The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Professional (2003)

    Google Scholar 

  21. John, A., Konnov, I., Schmid, U., Veith, H., Widder, J.: Brief announcement: Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In: ACM PODC (to appear, 2013) (long version at arXiv CoRR abs/1210.3846)

    Google Scholar 

  22. Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lamport, L.: On interprocess communication. Part I: Basic formalism. Distributed Computing 1(2), 77–85 (1986)

    Article  MATH  Google Scholar 

  24. Lynch, N.: Distributed Algorithms. Morgan Kaufman, San Francisco (1996)

    MATH  Google Scholar 

  25. Malekpour, M.R., Siminiceanu, R.: Comments on the “Byzantine self-stabilizing pulse synchronization”. protocol: Counterexamples. Tech. rep., NASA (February 2006)

    Google Scholar 

  26. Martin, J.P., Alvisi, L.: Fast Byzantine consensus. IEEE Trans. Dep. Sec. Comp. 3(3), 202–215 (2006)

    Article  Google Scholar 

  27. Mostéfaoui, A., Mourgaya, E., Parvédy, P.R., Raynal, M.: Evaluating the condition-based approach to solve consensus. In: DSN, pp. 541–550 (2003)

    Google Scholar 

  28. Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  29. Pnueli, A., Xu, J., Zuck, L.: Liveness with (0,1,∞)- counter abstraction. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 107–122. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  30. Powell, D.: Failure mode assumptions and assumption coverage. In: FTCS-22, Boston, MA, USA, pp. 386–395 (1992)

    Google Scholar 

  31. Santoro, N., Widmayer, P.: Time is not a healer. In: Cori, R., Monien, B. (eds.) STACS 1989. LNCS, vol. 349, pp. 304–313. Springer, Heidelberg (1989)

    Chapter  Google Scholar 

  32. Schneider, F., Easterbrook, S.M., Callahan, J.R., Holzmann, G.J.: Validating requirements for fault tolerant systems using model checking. In: ICRE, pp. 4–13 (1998)

    Google Scholar 

  33. Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)

    Article  Google Scholar 

  34. Srikanth, T., Toueg, S.: Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing 2, 80–94 (1987)

    Article  Google Scholar 

  35. Steiner, W., Rushby, J.M., Sorea, M., Pfeifer, H.: Model checking a fault-tolerant startup algorithm: From design exploration to exhaustive fault simulation. In: DSN, pp. 189–198 (2004)

    Google Scholar 

  36. Tsuchiya, T., Schiper, A.: Verification of consensus algorithms using satisfiability solving. Distributed Computing 23(5-6), 341–358 (2011)

    Article  MATH  Google Scholar 

  37. Widder, J., Schmid, U.: Booting clock synchronization in partially synchronous systems with hybrid process and link failures. Distributed Computing 20(2), 115–140 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

John, A., Konnov, I., Schmid, U., Veith, H., Widder, J. (2013). Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms. In: Bartocci, E., Ramakrishnan, C.R. (eds) Model Checking Software. SPIN 2013. Lecture Notes in Computer Science, vol 7976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39176-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39176-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39175-0

  • Online ISBN: 978-3-642-39176-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics