On Verifying Fault Tolerance of Distributed Protocols

Fisman, Dana; Kupferman, Orna; Lustig, Yoad

doi:10.1007/978-3-540-78800-3_22

On Verifying Fault Tolerance of Distributed Protocols

Dana Fisman^1,2,
Orna Kupferman¹ &
Yoad Lustig¹

Conference paper

4197 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4963))

Abstract

Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods.

In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.

Download to read the full chapter text

Chapter PDF

References

Abdulla, P.A., d’Orso, J., Jonsson, B., Nilsson, M.: Algorithmic improvements in regular model checking. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 236–248. Springer, Heidelberg (2003)
Google Scholar
Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: Regular model checking for LTL(MSO). In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 348–360. Springer, Heidelberg (2004)
Google Scholar
Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: A survey of regular model checking. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 35–48. Springer, Heidelberg (2004)
Google Scholar
Apt, K., Kozen, D.: Limits for automatic verification of finite-state concurrent systems. Information Processing Letters 22(6), 307–309 (1986)
Article MathSciNet Google Scholar
Arora, A., Gouda, M.G.: Closure and convergence: A foundation of fault-tolerant computing. Software Engineering 19(11), 1015–1027 (1993)
Article Google Scholar
Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM TOPLAS 26, 128–185 (2004)
Article Google Scholar
Awerbuch, B.: Optimal distributed algorithms for minimum weight spanning tree, counting, leader election and related problems. In: Proc. 19th STOC, pp. 230–240 (1987)
Google Scholar
Baier, C., Bertrand, N., Schnoebelen, P.: On computing fixpoints in well-structured regular model checking, with applications to lossy channel systems. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 347–361. Springer, Heidelberg (2006)
Chapter Google Scholar
Bardin, S., Finkel, A., Leroux, J., Schnoebelen, P.: Flat acceleration in symbolic model checking. In: Peled, D.A., Tsay, Y.-K. (eds.) ATVA 2005. LNCS, vol. 3707, pp. 474–488. Springer, Heidelberg (2005)
Chapter Google Scholar
Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 372–386. Springer, Heidelberg (2004)
Google Scholar
Büchi, J.R.: Weak second-order arithmetic and finite automata. Zeit. Math. Logik und Grundl. Math. 6, 66–92 (1960)
Article MATH Google Scholar
Keneddey Space Center. NASA space shuttle launch archive, mission STS-1 (1981) http://science.ksc.nasa.gov/shuttle/missions/sts-1/mission-sts-1.html
Daliot, A., Dolev, D., Parnas, H.: Linear time byzantine self-stabilizing clock synchronization. In: Proc. of 7th PODC, pp. 7–19 (2003)
Google Scholar
Dolev, D., Strong, H.R.: Authenticated algorithms for byzantine agreement. SIAM Journal on Computing 12, 656–666 (1983)
Article MATH MathSciNet Google Scholar
Elgaard, J., Klarlund, N., Möller, A.: Mona 1.x: new techniques for WS1S and WS2S. In: Y. Vardi, M. (ed.) CAV 1998. LNCS, vol. 1427, pp. 516–520. Springer, Heidelberg (1998)
Chapter Google Scholar
Elgot, C.: Decision problems of finite-automata design and related arithmetics. Trans. Amer. Math. Soc. 98, 21–51 (1961)
Article MathSciNet Google Scholar
Emerson, E.A., Kahlon, V.: Reducing model checking of the many to the few. In: Proc. 17th CAD, pp. 236–255 (2000)
Google Scholar
Emerson, E.A., Namjoshi, K.S.: On reasoning about rings. IJFCS 14(4), 527–550 (2003)
MATH MathSciNet Google Scholar
Faloutsos, M., Molle, M.: Optimal distributed algorithm for minimum spanning trees revisited. In: Proc. 14th PODC, pp. 231–237 (1995)
Google Scholar
Fang, Y., Piterman, N., Pnueli, A., Zuck, L.: Liveness with invisible ranking. STTT 8(3), 261–279 (2004)
Article Google Scholar
Fisman, D., Pnueli, A.: Beyond regular model checking. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, Springer, Heidelberg (2001)
Chapter Google Scholar
Habermehl, P., Vojnar, T.: Regular model checking using inference of regular languages. ENTCS 138(3), 21–36 (2005)
MathSciNet Google Scholar
Kesten, Y., Maler, O., Marcus, M., Pnueli, A., Shahar, E.: Symbolic model checking with rich assertional languages. TCS 256, 93–112 (2001)
Article MATH MathSciNet Google Scholar
Lesens, D., Halbwachs, N., Raymond, P.: Automatic verification of parameterized linear networks of processes. In: Proc. 24th POPL, pp. 346–357 (1997)
Google Scholar
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
MATH Google Scholar
Malekpour, M.R.: A byzantine fault-tolerant self-stabilization synchronization protocol for distributed clock synchronization systems. TR NASA/TM-2006-214322, NASA STI (2006)
Google Scholar
Malekpour, M.R., Sinimiceanu, R.: Comments on the byzantine self-stabilization synchronization protocol: counterexamples. TR NASA/TM-2006-213951, NASA STI, (2006)
Google Scholar
Pnueli, A., Shahar, E.: Liveness and acceleration in parameterized verification. In: Proc. 12th CAV, pp. 328–343 (2000)
Google Scholar
Schlichting, R.D., Schneider, F.B.: Fail-stop processors: An approach to designing fault-tolerant computing systems. Computer Systems 1(3), 222–238 (1983)
Google Scholar
Tanenbaum, A., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs (2007)
MATH Google Scholar
Thomas, W.: Automata on infinite objects. Handbook of Theoretical Computer Science, 133–191 (1990)
Google Scholar
Vardi, M.Y., Wolper, P.: Reasoning about infinite computations. I&C 115(1), 1–37 (1994)
MATH MathSciNet Google Scholar
Wolper, P., Lovinfosse, V.: Verifying properties of large sets of processes with network invariants. In: Proc. Automatic verification methods for finite state systems, pp. 68–80 (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hebrew University, Jerusalem, 91904, Israel
Dana Fisman, Orna Kupferman & Yoad Lustig
IBM Haifa Research Lab, Haifa University Campus, Haifa, 31905, Israel
Dana Fisman

Authors

Dana Fisman
View author publications
You can also search for this author in PubMed Google Scholar
Orna Kupferman
View author publications
You can also search for this author in PubMed Google Scholar
Yoad Lustig
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

C. R. Ramakrishnan Jakob Rehof

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fisman, D., Kupferman, O., Lustig, Y. (2008). On Verifying Fault Tolerance of Distributed Protocols. In: Ramakrishnan, C.R., Rehof, J. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2008. Lecture Notes in Computer Science, vol 4963. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78800-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-78800-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78799-0
Online ISBN: 978-3-540-78800-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics