Error Recovery

  • Peter Alan Lee
  • Thomas Anderson
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 3)


The previous two chapters have discussed in some detail the first two phases in the provision of fault tolerance in a system, namely, the detection of errors and the subsequent assessment of the extent of damage to the system state. These two phases are passive in the sense that they are not intended to effect any changes to the system. In contrast, the two remaining phases are active since they do change the system and thereby enable faults and their consequences to be tolerated. This chapter addresses the topic of error recovery, the aim of which is to eliminate errors from the system state. Chapter 8 discusses the fault treatment phase of fault tolerance which attempts to clear faults from a system so that further errors are not generated and thus ensure that continued service can be provided.


Recovery Data Error Recovery Recovery Technique Audit Trail Data Base System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    T. Anderson and J.C. Knight, “A Framework for Software Fault Tolerance in Real-Time Systems,” IEEE Transactions on Software Engineering SE-9 (3), pp. 355–364 (May 1983).CrossRefGoogle Scholar
  2. 2.
    P.J. Kennedy and T.M. Quinn, “Recovery Strategies in the No. 2 Electronic Switching System,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 165–169 (June 1972).Google Scholar
  3. 3.
    W.W. Peterson and E.J. Weldon Jr, Error-Correcting Codes, MIT Press, Cambridge (MA ) (1972).MATHGoogle Scholar
  4. W.N. Toy, “Fault-Tolerant Design of Local ESS Processors,” Proceedings of the IEEE 66 (10), pp.1126–1145 (October 1978).Google Scholar
  5. 5.
    D.J. Taylor and J.P. Black, “Principles of Data Structure Error Correction,” IEEE Transactions on Computers C-31 (7), pp. 602–608 (July 1982).Google Scholar
  6. 6.
    D.J. Taylor and J.P. Black, “Guidelines for Storage Structure Error Correction,” Digest of Papers FTCS-15: Fifteenth Annual International Symposium on Fault-Tolerant Computing, Ann Arbor (MI), pp. 20–22 (June 1985).Google Scholar
  7. 7.
    D.J. Taylor and C.H. Seger, “Robust Storage Structures for Crash Recovery,” IEEE Transactions on Computers C-35 (4), pp. 288–295 (April 1986).CrossRefGoogle Scholar
  8. I.J. Davis, Error Correction in Robust Storage Structures,PhD. Thesis, University of Waterloo (1988) Google Scholar
  9. 9.
    J.P. Black, D.J. Taylor, and D.E. Morgan, “A Case Study in Fault Tolerant Software, ” Software–Practice and Experience 11(2), pp. 145–157 (February 1981).CrossRefGoogle Scholar
  10. 10.
    J.R. Connet, E.J. Pasternak, and B.D. Wagner, “Software Defenses in Real-Time Control Systems,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 94–99 (June 1972).Google Scholar
  11. 11.
    R.P. Almquist et al., “Software Protection in No. 1 ESS,” International Switching Symposium Record, Cambridge (MA), pp. 565–569 (June 1972).Google Scholar
  12. 12.
    F. Cristian, “Exception Handling,” pp. 68–97 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).Google Scholar
  13. P.A. Bernstein, “Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing,” IEEE Computer 21 (2), pp.37–45 (February 1988).Google Scholar
  14. 14.
    E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM 26 (2), pp.259–270 (April 1979).Google Scholar
  15. 15.
    A.B. Tonik, “Checkpoint, Restart and Recovery: Selected Annotated Bibliography,” SIGMOD FDT Bulletin 7 (3–4), pp. 72–76 (1975).Google Scholar
  16. 16.
    L.A. Bjork, “Generalized Audit Trail Requirements and Concepts for Data Base Applications,” IBM Systems Journal 14 (3), pp. 229–245 (1975).CrossRefMathSciNetGoogle Scholar
  17. 17.
    J.N. Gray, “Notes on Data Base Operating Systems,” pp. 393481 in Lecture Notes in Computer Science 60, (ed. R. Bayer, R.M. Graham and G. Seegmuller ), Springer-Verlag, Berlin (1978).Google Scholar
  18. 18.
    P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, Reading (MA ) (1987).Google Scholar
  19. 19.
    R. Boyd, “Restoral of a Real Time Operating System,” Proceedings of 1971 ACM Annual Conference, Chicago (IL), pp. 109–111 (August 1971).Google Scholar
  20. 20.
    J.S.M. Verhofstad, “Recovery Techniques for Data Base Systems,” Computing Surveys 10 (2), pp. 167–195 (June 1978).CrossRefGoogle Scholar
  21. 21.
    D.G. Severance and G.M. Lohman, “Differential Files: their Application to the Maintenance of Large Databases,” ACM Transactions on Database Systems 1 (3), pp. 256–267 (September 1976).CrossRefGoogle Scholar
  22. 22.
    M.M. Astrahan et al.,“System R: Relational Approach to Database Management,” ACM Transactions on Database Systems 1 (2), pp.97–137 (June 1976).CrossRefGoogle Scholar
  23. 23.
    J.N. Gray et al., “The Recovery Manager of a Data Management System,” Report RJ2623, IBM Research Laboratory, San Jose (CA) (August 1979).Google Scholar
  24. 24.
    J.J. Horning et al., “A Program Structure for Error Detection and Recovery,”pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser), Springer-Verlag, Berlin (1974).Google Scholar
  25. 25.
    P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C -29 (6), pp.546549 (June 1980).Google Scholar
  26. 26.
    R. Kerr, “An Experimental Processor Architecture for Improved Reliability,” pp. 199–212 in State of the Art Report on System Reliability and Integrity, Infotech, Maidenhead (1978).Google Scholar
  27. 27.
    T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).Google Scholar
  28. 28.
    K.P. Eswaran et al., “The Notion of Consistency and Predicate Locks in a Data Base System,”Communications of the ACM 19 (11), pp.624–633 (November 1976)Google Scholar
  29. 29.
    T. Anderson, P.A. Lee, and S.K. Shrivastava, “A Model of Recoverability in Multilevel Systems,” IEEE Transactions on Software Engineering SE-4 (6), pp. 486–494 (November 1978).Google Scholar
  30. 30.
    G.N. Dixon, S.K. Shrivastava, and G.D. Parrington, “Exploiting Type Inheritance Facilities to Implement Recoverability in Object Based Systems,” Proc. of 6th Symposium on Reliability in Distributed Software and Database Systems, Williamsburg, pp. 107–114 (March 1987).Google Scholar
  31. 31.
    C.A.R. Hoare, “Parallel Programming: An Axiomatic Approach,” pp. 11–42 in Lecture Notes in Computer Science 46, (ed. F.L. Bauer and K. Samelson ), Springer-Verlag, Berlin (1976).Google Scholar
  32. 32.
    C.A.R. Hoare, “Monitors: An Operating System Structuring Concept,” Communications of the ACM 17 (10), pp. 549–557 (October 1974).Google Scholar
  33. 33.
    B. Randell, “System Structure for Software Fault Tolerance,” pp. 195–219 in Current Trends in Programming Methodology, Vol. 1, (ed. R.T. Yeh ), Prentice-Hall, Englewood Cliffs (NJ) (1977).Google Scholar
  34. 34.
    B. Randell, P.A. Lee, and P.C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10 (2), pp. 123–165 (June 1978).Google Scholar
  35. 35.
    B. Lampson, “Atomic Transactions,” pp. 246–265 in Distributed Systems - Architecture and Implementation, Lecture Notes in Computer Science 105, (ed. B. Lampson et al.), Springer-Verlag, Berlin (1981).Google Scholar
  36. 36.
    J.P. Banatre etal., “The Design and Building of ENCHERE, a Distributed Marketing System, ” Communications of the ACM 29(1), PP. 19–29 (January 1986).Google Scholar

Copyright information

© Springer-Verlag/Wien 1990

Authors and Affiliations

  • Peter Alan Lee
    • 1
  • Thomas Anderson
    • 1
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK

Personalised recommendations