A rigorous approach to fault-tolerant system development
This paper investigates the issue of what it means for a system to behave correctly despite of hardware fault occurrences. Using a stable storage system as a running example, a framework is presented for specifying, understanding, and verifying the correctness of fault-tolerant systems.
A clear separation is made between the notions of software correctness and system reliability in the face of hardware malfunction. Correctness is established by using a programming logic augmented with fault axioms and rules. Stochastic modelling is employed to investigate reliability/availability system properties.
Index TermsCorrectness Fault-Tolerance Reliability
Unable to display preview. Download preview PDF.
- [BC]Best E. and F. Cristian "Systematic Detection of Exception Occurrneces", Science of Computer Programming, Vol 1, No 1, (1981).Google Scholar
- [BR]Best E. and B. Randell "A Formal Model of Atomicity in Asynchronous Systems", Acta Informatica, Vol 16, pp. 93–124, (1981).Google Scholar
- [CA]Costes, A. et al., "SURF: A Program for Dependability Evaluation of Complex Fault-Tolerant Systems," IEEE 11th Int. Conf. on Fault-Tolerant Computing, pp. 72–78 (1981).Google Scholar
- [C1]Cristian, F., "Robust Data Types" Acta Informatica Vol. 17, pp. 365–397, (1982).Google Scholar
- [C2]Cristian, F., "Correct and Robust Programs" IBM Research Report RJ3753 (1983). To appear in IEEE Transactions on Software Engineering.Google Scholar
- [C3]Cristian, F., "A Rigorous Approach to Fault-Tolerant System Development" IBM Research Report RJ3754 (January 1983)Google Scholar
- [dB]de Bakker, J. Mathematical Theory of Program Correctness Prentice Hall, (1980).Google Scholar
- [Fl]Floyd, R.W., "Assigning Meanings to Programs", in Mathematical Aspects of Computer Science, XIX American Mathematical Society, pp. 19–32, (1967).Google Scholar
- [GM]Gelenbe, E. and I. Mitrani, Analysis and Synthesis of Computer Systems, Academic Press (1980).Google Scholar
- [LS]Lampson, B. W. and H. E. Sturgis, "Crash Recovery in a Distributed Data Storage System," Xerox PARC Report, Palo Alto, Calif. (April 1979).Google Scholar
- [MS]Melliar-Smith, P. M. and R. L. Schwartz, "Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control System," IEEE Trans. on Computers, Vol. C-31(7) (1982).Google Scholar
- [NA]Ng, Y. W. and A. Avizienis, "ARIES: An Automated Reliability Evaluation System," Proc. 1977 Annual Reliability and Maintainability Symposium, pp. 182–188 (1977).Google Scholar
- [SS]Schlichting, R. D. and F. B. Schneider, "Verification of Fault-Tolerant Software," TR 80–446, Cornell University (1980).Google Scholar
- [WA]Wensley, J. H. et al., "SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control," Proceedings of the IEEE vol. 66(10), pp. 1240–1255 (October 1978).Google Scholar