Semantics and Logic for Provable Fault-Tolerance, A Tutorial

  • Tomasz Janowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1538)


This tutorial is about design and proof of design of reliable systems from unreliable components. It teaches the concept and techniques of fault-tolerance, at the same time building a formal theory where this property can be specified and verified. The theory eventually supports a range of useful design techniques, especially for multiple faults. We extend CCS, its bisimulation equivalence and modal logic, under the driving principle that any claim about fault-tolerance should be invariant under the removal of faults from the assumptions (faults are unpredictable); this principle rejects the reduction of fault-tolerance to “correctness under all anticipated faults”. The theory is applied to the range of examples and eventually extended to include considerations of fault-tolerance and timing, under scheduling on the limited resources. This document describes the motivation and the contents of the tutorial


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Bruns. Applying process refinement to a safety-relevant system. Technical report, Lab. for Foundations of Computer Science, University of Edinburgh, 1994.Google Scholar
  2. 2.
    K.M. Chandy and J. Misra. Parallel Program Design. Addison-Wesley, 1988.Google Scholar
  3. 3.
    R. Cleaveland, J. Parrow, and B. Steffen. The Concurrency Workbench: A semantics-based tool for the verification of concurrent systems. ACM Transactions on Programming Languages and Systems, 15(1):36–72, 1993.CrossRefGoogle Scholar
  4. 4.
    F. Cristian. A rigorous approach to fault-tolerant programming. IEEE Transactions on Software Engineering, 11(1):23–31, 1985.CrossRefGoogle Scholar
  5. 5.
    J. Fitzgerald, C. Jones, and P. Lucas, editors. FME’97: Industrial Applications and Strengthened Foundations of Formal Methods, volume 1313 of LNCS, 1997.Google Scholar
  6. 6.
    M. Hennessy and R. Milner. Algebraic laws for nondeterminism and concurrency. Journal of the ACM, 32(1):137–161, 1985.zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    T. Janowski. Fault-tolerant bisimulation and process transformations. In Proc. 3rd Int. Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, volume 863 of LNCS, pages 373–392, 1994.Google Scholar
  8. 8.
    T. Janowski. Stepwise transformations for fault-tolerant design of CCS processes. In Proc. 7th Int. Conference on Formal Description Techniques, pages 505–520. Chapman and Hall, 1994.Google Scholar
  9. 9.
    T. Janowski. Bisimulation and Fault-Tolerance. PhD thesis, Department of Computer Science, University of Warwick, 1995.Google Scholar
  10. 10.
    T. Janowski. On bisimulation, fault-monotonicity and provable fault-tolerance. In Proc. 6th Int. Conference on Algebraic Methodology and Software Technology, LNCS, 1997.Google Scholar
  11. 11.
    T. Janowski and M. Joseph. Dynamic scheduling in the presence of faults: Specification and verification. In Proc. 4rd Int. Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, volume 1135 of LNCS, pages 279–297, 1996.Google Scholar
  12. 12.
    He Jifeng and C.A.R. Hoare. Algebraic specification and proof of a distributed recovery algorithm. Distributed Computing, 2:1–12, 1987.zbMATHCrossRefGoogle Scholar
  13. 13.
    R. Keller. Formal verification of parallel programs. Communications of ACM, 19(7):561–572, 1976.CrossRefGoogle Scholar
  14. 14.
    K.G. Larsen and R. Milner. A compositional protocol verification using relativized bisimulation. Information and Computation, 99:80–108, 1992.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Z. Liu. Fault-Tolerant Programming by Transformations. PhD thesis, University of Warwick, 1991.Google Scholar
  16. 16.
    Z. Liu and M. Joseph. Transformations of programs for fault-tolerance. Formal Aspects of Computing, 4:442–469, 1992.zbMATHCrossRefGoogle Scholar
  17. 17.
    R. Milner. A Calculus of Communicating Systems. LNCS, 92, 1980.Google Scholar
  18. 18.
    R. Milner. Communication and Concurrency. Prentice-Hall International, 1989.Google Scholar
  19. 19.
    K. Paliwoda and J.W. Sanders. An incremental specification of the sliding-window protocol. Distributed Computing, 5:83–94, 1991.zbMATHCrossRefGoogle Scholar
  20. 20.
    D. Park. Concurrency and automata on infinite sequences. LNCS, 104, 81.Google Scholar
  21. 21.
    J. Peleska. Design and verification of fault tolerant systems with CSP. Distributed Computing, 5:95–106, 1991.zbMATHCrossRefGoogle Scholar
  22. 22.
    K.V.S. Prasad. Combinators and Bisimulation Proofs for Restartable Systems. PhD thesis, Department of Computer Science, University of Edinburgh, 1987.Google Scholar
  23. 23.
    H. Schepers. Fault Tolerance and Timing of Distributed Systems. PhD thesis, Eindhoven University of Technology, 1994.Google Scholar
  24. 24.
    W. Yi. A Calculus of Real Time Systems. PhD thesis, Department of Computer Science, Chalmers University of Technology, 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Tomasz Janowski
    • 1
  1. 1.The United Nations UniversityInternational Institute for Software TechnologyMacau

Personalised recommendations