Abstract
Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods.
In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.
Chapter PDF
References
Abdulla, P.A., d’Orso, J., Jonsson, B., Nilsson, M.: Algorithmic improvements in regular model checking. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 236–248. Springer, Heidelberg (2003)
Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: Regular model checking for LTL(MSO). In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 348–360. Springer, Heidelberg (2004)
Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: A survey of regular model checking. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 35–48. Springer, Heidelberg (2004)
Apt, K., Kozen, D.: Limits for automatic verification of finite-state concurrent systems. Information Processing Letters 22(6), 307–309 (1986)
Arora, A., Gouda, M.G.: Closure and convergence: A foundation of fault-tolerant computing. Software Engineering 19(11), 1015–1027 (1993)
Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM TOPLAS 26, 128–185 (2004)
Awerbuch, B.: Optimal distributed algorithms for minimum weight spanning tree, counting, leader election and related problems. In: Proc. 19th STOC, pp. 230–240 (1987)
Baier, C., Bertrand, N., Schnoebelen, P.: On computing fixpoints in well-structured regular model checking, with applications to lossy channel systems. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 347–361. Springer, Heidelberg (2006)
Bardin, S., Finkel, A., Leroux, J., Schnoebelen, P.: Flat acceleration in symbolic model checking. In: Peled, D.A., Tsay, Y.-K. (eds.) ATVA 2005. LNCS, vol. 3707, pp. 474–488. Springer, Heidelberg (2005)
Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 372–386. Springer, Heidelberg (2004)
Büchi, J.R.: Weak second-order arithmetic and finite automata. Zeit. Math. Logik und Grundl. Math. 6, 66–92 (1960)
Keneddey Space Center. NASA space shuttle launch archive, mission STS-1 (1981) http://science.ksc.nasa.gov/shuttle/missions/sts-1/mission-sts-1.html
Daliot, A., Dolev, D., Parnas, H.: Linear time byzantine self-stabilizing clock synchronization. In: Proc. of 7th PODC, pp. 7–19 (2003)
Dolev, D., Strong, H.R.: Authenticated algorithms for byzantine agreement. SIAM Journal on Computing 12, 656–666 (1983)
Elgaard, J., Klarlund, N., Möller, A.: Mona 1.x: new techniques for WS1S and WS2S. In: Y. Vardi, M. (ed.) CAV 1998. LNCS, vol. 1427, pp. 516–520. Springer, Heidelberg (1998)
Elgot, C.: Decision problems of finite-automata design and related arithmetics. Trans. Amer. Math. Soc. 98, 21–51 (1961)
Emerson, E.A., Kahlon, V.: Reducing model checking of the many to the few. In: Proc. 17th CAD, pp. 236–255 (2000)
Emerson, E.A., Namjoshi, K.S.: On reasoning about rings. IJFCS 14(4), 527–550 (2003)
Faloutsos, M., Molle, M.: Optimal distributed algorithm for minimum spanning trees revisited. In: Proc. 14th PODC, pp. 231–237 (1995)
Fang, Y., Piterman, N., Pnueli, A., Zuck, L.: Liveness with invisible ranking. STTT 8(3), 261–279 (2004)
Fisman, D., Pnueli, A.: Beyond regular model checking. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, Springer, Heidelberg (2001)
Habermehl, P., Vojnar, T.: Regular model checking using inference of regular languages. ENTCS 138(3), 21–36 (2005)
Kesten, Y., Maler, O., Marcus, M., Pnueli, A., Shahar, E.: Symbolic model checking with rich assertional languages. TCS 256, 93–112 (2001)
Lesens, D., Halbwachs, N., Raymond, P.: Automatic verification of parameterized linear networks of processes. In: Proc. 24th POPL, pp. 346–357 (1997)
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
Malekpour, M.R.: A byzantine fault-tolerant self-stabilization synchronization protocol for distributed clock synchronization systems. TR NASA/TM-2006-214322, NASA STI (2006)
Malekpour, M.R., Sinimiceanu, R.: Comments on the byzantine self-stabilization synchronization protocol: counterexamples. TR NASA/TM-2006-213951, NASA STI, (2006)
Pnueli, A., Shahar, E.: Liveness and acceleration in parameterized verification. In: Proc. 12th CAV, pp. 328–343 (2000)
Schlichting, R.D., Schneider, F.B.: Fail-stop processors: An approach to designing fault-tolerant computing systems. Computer Systems 1(3), 222–238 (1983)
Tanenbaum, A., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs (2007)
Thomas, W.: Automata on infinite objects. Handbook of Theoretical Computer Science, 133–191 (1990)
Vardi, M.Y., Wolper, P.: Reasoning about infinite computations. I&C 115(1), 1–37 (1994)
Wolper, P., Lovinfosse, V.: Verifying properties of large sets of processes with network invariants. In: Proc. Automatic verification methods for finite state systems, pp. 68–80 (1990)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fisman, D., Kupferman, O., Lustig, Y. (2008). On Verifying Fault Tolerance of Distributed Protocols. In: Ramakrishnan, C.R., Rehof, J. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2008. Lecture Notes in Computer Science, vol 4963. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78800-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-78800-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78799-0
Online ISBN: 978-3-540-78800-3
eBook Packages: Computer ScienceComputer Science (R0)