On Verifying Fault Tolerance of Distributed Protocols

  • Dana Fisman
  • Orna Kupferman
  • Yoad Lustig
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4963)


Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods.

In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.


Model Check Critical Section Mutual Exclusion Regular Language Transient Fault 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abdulla, P.A., d’Orso, J., Jonsson, B., Nilsson, M.: Algorithmic improvements in regular model checking. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 236–248. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: Regular model checking for LTL(MSO). In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 348–360. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Abdulla, P.A., Jonsson, B., Nilsson, M., d’Orso, J., Saksena, M.: A survey of regular model checking. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 35–48. Springer, Heidelberg (2004)Google Scholar
  4. 4.
    Apt, K., Kozen, D.: Limits for automatic verification of finite-state concurrent systems. Information Processing Letters 22(6), 307–309 (1986)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Arora, A., Gouda, M.G.: Closure and convergence: A foundation of fault-tolerant computing. Software Engineering 19(11), 1015–1027 (1993)CrossRefGoogle Scholar
  6. 6.
    Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM TOPLAS 26, 128–185 (2004)CrossRefGoogle Scholar
  7. 7.
    Awerbuch, B.: Optimal distributed algorithms for minimum weight spanning tree, counting, leader election and related problems. In: Proc. 19th STOC, pp. 230–240 (1987)Google Scholar
  8. 8.
    Baier, C., Bertrand, N., Schnoebelen, P.: On computing fixpoints in well-structured regular model checking, with applications to lossy channel systems. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 347–361. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Bardin, S., Finkel, A., Leroux, J., Schnoebelen, P.: Flat acceleration in symbolic model checking. In: Peled, D.A., Tsay, Y.-K. (eds.) ATVA 2005. LNCS, vol. 3707, pp. 474–488. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 372–386. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Büchi, J.R.: Weak second-order arithmetic and finite automata. Zeit. Math. Logik und Grundl. Math. 6, 66–92 (1960)zbMATHCrossRefGoogle Scholar
  12. 12.
    Keneddey Space Center. NASA space shuttle launch archive, mission STS-1 (1981)
  13. 13.
    Daliot, A., Dolev, D., Parnas, H.: Linear time byzantine self-stabilizing clock synchronization. In: Proc. of 7th PODC, pp. 7–19 (2003)Google Scholar
  14. 14.
    Dolev, D., Strong, H.R.: Authenticated algorithms for byzantine agreement. SIAM Journal on Computing 12, 656–666 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Elgaard, J., Klarlund, N., Möller, A.: Mona 1.x: new techniques for WS1S and WS2S. In: Y. Vardi, M. (ed.) CAV 1998. LNCS, vol. 1427, pp. 516–520. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  16. 16.
    Elgot, C.: Decision problems of finite-automata design and related arithmetics. Trans. Amer. Math. Soc. 98, 21–51 (1961)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Emerson, E.A., Kahlon, V.: Reducing model checking of the many to the few. In: Proc. 17th CAD, pp. 236–255 (2000)Google Scholar
  18. 18.
    Emerson, E.A., Namjoshi, K.S.: On reasoning about rings. IJFCS 14(4), 527–550 (2003)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Faloutsos, M., Molle, M.: Optimal distributed algorithm for minimum spanning trees revisited. In: Proc. 14th PODC, pp. 231–237 (1995)Google Scholar
  20. 20.
    Fang, Y., Piterman, N., Pnueli, A., Zuck, L.: Liveness with invisible ranking. STTT 8(3), 261–279 (2004)CrossRefGoogle Scholar
  21. 21.
    Fisman, D., Pnueli, A.: Beyond regular model checking. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, Springer, Heidelberg (2001)CrossRefGoogle Scholar
  22. 22.
    Habermehl, P., Vojnar, T.: Regular model checking using inference of regular languages. ENTCS 138(3), 21–36 (2005)MathSciNetGoogle Scholar
  23. 23.
    Kesten, Y., Maler, O., Marcus, M., Pnueli, A., Shahar, E.: Symbolic model checking with rich assertional languages. TCS 256, 93–112 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Lesens, D., Halbwachs, N., Raymond, P.: Automatic verification of parameterized linear networks of processes. In: Proc. 24th POPL, pp. 346–357 (1997)Google Scholar
  25. 25.
    Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)zbMATHGoogle Scholar
  26. 26.
    Malekpour, M.R.: A byzantine fault-tolerant self-stabilization synchronization protocol for distributed clock synchronization systems. TR NASA/TM-2006-214322, NASA STI (2006) Google Scholar
  27. 27.
    Malekpour, M.R., Sinimiceanu, R.: Comments on the byzantine self-stabilization synchronization protocol: counterexamples. TR NASA/TM-2006-213951, NASA STI, (2006)Google Scholar
  28. 28.
    Pnueli, A., Shahar, E.: Liveness and acceleration in parameterized verification. In: Proc. 12th CAV, pp. 328–343 (2000)Google Scholar
  29. 29.
    Schlichting, R.D., Schneider, F.B.: Fail-stop processors: An approach to designing fault-tolerant computing systems. Computer Systems 1(3), 222–238 (1983)Google Scholar
  30. 30.
    Tanenbaum, A., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs (2007)zbMATHGoogle Scholar
  31. 31.
    Thomas, W.: Automata on infinite objects. Handbook of Theoretical Computer Science, 133–191 (1990)Google Scholar
  32. 32.
    Vardi, M.Y., Wolper, P.: Reasoning about infinite computations. I&C 115(1), 1–37 (1994)zbMATHMathSciNetGoogle Scholar
  33. 33.
    Wolper, P., Lovinfosse, V.: Verifying properties of large sets of processes with network invariants. In: Proc. Automatic verification methods for finite state systems, pp. 68–80 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Dana Fisman
    • 1
    • 2
  • Orna Kupferman
    • 1
  • Yoad Lustig
    • 1
  1. 1.School of Computer Science and EngineeringHebrew UniversityJerusalemIsrael
  2. 2.IBM Haifa Research LabHaifa University CampusHaifaIsrael

Personalised recommendations