Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms

John, Annu; Konnov, Igor; Schmid, Ulrich; Veith, Helmut; Widder, Josef

doi:10.1007/978-3-642-39176-7_14

Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms

Annu John¹⁸,
Igor Konnov¹⁸,
Ulrich Schmid¹⁸,
Helmut Veith¹⁸ &
…
Josef Widder¹⁸

Conference paper

766 Accesses
27 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7976))

Abstract

Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-of-the-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems.

In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also for larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos), efficient methods to handle them bring us closer to the above mentioned goal.

As a case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.

Supported by the Austrian National Research Network S11403 and S11405 (RiSE) of the Austrian Science Fund (FWF) and by the Vienna Science and Technology Fund (WWTF) through grants PROSEED, ICT12-059, and VRG11-005.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ademaj, A.: Slightly-off-specification failures in the time-triggered architecture. In: High-Level Design Validation and Test Workshop, pp. 7–12. IEEE (2002)
Google Scholar
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Consensus with Byzantine failures and little system synchrony. In: DSN, pp. 147–155 (2006)
Google Scholar
Attiya, H., Welch, J.: Distributed Computing, 2nd edn. John Wiley & Sons (2004)
Google Scholar
Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theoretical Computer Science 412(40), 5602–5630 (2011)
Article MathSciNet MATH Google Scholar
Bokor, P., Kinder, J., Serafini, M., Suri, N.: Efficient model checking of fault-tolerant distributed protocols. In: DSN, pp. 73–84 (2011)
Google Scholar
Bracha, G., Toueg, S.: Asynchronous consensus and broadcast protocols. J. ACM 32(4), 824–840 (1985)
Article MathSciNet Google Scholar
Browne, M.C., Clarke, E.M., Grumberg, O.: Reasoning about networks with many identical finite state processes. Inf. Comput. 81, 13–31 (1989)
Article MathSciNet MATH Google Scholar
Bucchiarone, A., Muccini, H., Pelliccione, P.: Architecting fault-tolerant component-based systems: from requirements to testing. Electr. Notes Theor. Comput. Sci. 168, 77–90 (2007)
Article Google Scholar
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Article MathSciNet MATH Google Scholar
Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with benign faults. Distributed Computing 22(1), 49–71 (2009)
Article Google Scholar
Clarke, E., Talupur, M., Veith, H.: Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 33–47. Springer, Heidelberg (2008)
Chapter Google Scholar
Clarke, E., Talupur, M., Touili, T., Veith, H.: Verification by network decomposition. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 276–291. Springer, Heidelberg (2004)
Chapter Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Emerson, E., Namjoshi, K.: Reasoning about rings. In: POPL, pp. 85–94 (1995)
Google Scholar
Feather, M.S., Fickas, S., Razermera-Mamy, N.A.: Model-checking for validation of a fault protection system. In: HASE, pp. 32–41 (2001)
Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Article MathSciNet MATH Google Scholar
Fisman, D., Kupferman, O., Lustig, Y.: On verifying fault tolerance of distributed protocols. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 315–331. Springer, Heidelberg (2008)
Chapter Google Scholar
Függer, M., Schmid, U.: Reconciling fault-tolerant distributed computing and systems-on-chip. Distributed Computing 24(6), 323–355 (2012)
Article MATH Google Scholar
Gnesi, S., Latella, D., Lenzini, G., Abbaneo, C., Amendola, A., Marmo, P.: A formal specification and validation of a critical system in presence of Byzantine errors. In: Graf, S. (ed.) TACAS 2000. LNCS, vol. 1785, pp. 535–549. Springer, Heidelberg (2000)
Chapter Google Scholar
Holzmann, G.: The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Professional (2003)
Google Scholar
John, A., Konnov, I., Schmid, U., Veith, H., Widder, J.: Brief announcement: Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In: ACM PODC (to appear, 2013) (long version at arXiv CoRR abs/1210.3846)
Google Scholar
Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)
Article MathSciNet MATH Google Scholar
Lamport, L.: On interprocess communication. Part I: Basic formalism. Distributed Computing 1(2), 77–85 (1986)
Article MATH Google Scholar
Lynch, N.: Distributed Algorithms. Morgan Kaufman, San Francisco (1996)
MATH Google Scholar
Malekpour, M.R., Siminiceanu, R.: Comments on the “Byzantine self-stabilizing pulse synchronization”. protocol: Counterexamples. Tech. rep., NASA (February 2006)
Google Scholar
Martin, J.P., Alvisi, L.: Fast Byzantine consensus. IEEE Trans. Dep. Sec. Comp. 3(3), 202–215 (2006)
Article Google Scholar
Mostéfaoui, A., Mourgaya, E., Parvédy, P.R., Raynal, M.: Evaluating the condition-based approach to solve consensus. In: DSN, pp. 541–550 (2003)
Google Scholar
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)
Article MathSciNet MATH Google Scholar
Pnueli, A., Xu, J., Zuck, L.: Liveness with (0,1,∞)- counter abstraction. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 107–122. Springer, Heidelberg (2002)
Chapter Google Scholar
Powell, D.: Failure mode assumptions and assumption coverage. In: FTCS-22, Boston, MA, USA, pp. 386–395 (1992)
Google Scholar
Santoro, N., Widmayer, P.: Time is not a healer. In: Cori, R., Monien, B. (eds.) STACS 1989. LNCS, vol. 349, pp. 304–313. Springer, Heidelberg (1989)
Chapter Google Scholar
Schneider, F., Easterbrook, S.M., Callahan, J.R., Holzmann, G.J.: Validating requirements for fault tolerant systems using model checking. In: ICRE, pp. 4–13 (1998)
Google Scholar
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Article Google Scholar
Srikanth, T., Toueg, S.: Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing 2, 80–94 (1987)
Article Google Scholar
Steiner, W., Rushby, J.M., Sorea, M., Pfeifer, H.: Model checking a fault-tolerant startup algorithm: From design exploration to exhaustive fault simulation. In: DSN, pp. 189–198 (2004)
Google Scholar
Tsuchiya, T., Schiper, A.: Verification of consensus algorithms using satisfiability solving. Distributed Computing 23(5-6), 341–358 (2011)
Article MATH Google Scholar
Widder, J., Schmid, U.: Booting clock synchronization in partially synchronous systems with hybrid process and link failures. Distributed Computing 20(2), 115–140 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Vienna University of Technology (TU Wien), Austria
Annu John, Igor Konnov, Ulrich Schmid, Helmut Veith & Josef Widder

Authors

Annu John
View author publications
You can also search for this author in PubMed Google Scholar
Igor Konnov
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Helmut Veith
View author publications
You can also search for this author in PubMed Google Scholar
Josef Widder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering, Vienna, University of Technology, Austria
Ezio Bartocci
Department of Computer Science, Stony Brook University, 11794-4400, Stony Brook, NY, USA
C. R. Ramakrishnan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

John, A., Konnov, I., Schmid, U., Veith, H., Widder, J. (2013). Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms. In: Bartocci, E., Ramakrishnan, C.R. (eds) Model Checking Software. SPIN 2013. Lecture Notes in Computer Science, vol 7976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39176-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-39176-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39175-0
Online ISBN: 978-3-642-39176-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics