Recovery: Searching and Monitoring of Correct Software States

  • Igor SchagaevEmail author
  • Eugene Zouev
  • Kaegi Thomas


The last of the three GAFT processes is called recovery and recovery monitoring. After the detection of an error and possible reconfiguration, the last step is recovering the software, which means that the effect of the error on the software must be eliminated. In line with the previous chapters and [1, 2, 3, 4, 5, 6], the recovery consists of restoring the last recovery point and continuing the processing. But is this really sufficient? What if latent faults exist in the system and manifest themselves in the system but trigger some detection schemes an arbitrary time later? Assuming this reasonable and unpleasant sequence of events, it becomes clear that just restoring data and program from the last stored recovery point is not enough. We have to admit that we do not have any guarantee that fault is now eliminated: even when hardware is restored or even reconfigured—we have erroneous states of software recorded in recovery points. Thus, we have to consider the recovery process itself and analyze which classic algorithms are applicable and fit the purpose of efficient recovery. We introduce and analyze three recovery algorithms that are able to ensure successful recovery by iteratively go through all stored recovery points.


  1. 1.
    Sogomonian E, Schagaev I (1988) Hardware and software fault tolerance of computer systems. Avtom I Telemekhanika, 3–39Google Scholar
  2. 2.
    Schagaev I (1989) Computing process recovery algorithms. Avtomat Telemekh (4)Google Scholar
  3. 3.
    Schagaev I (1990) Using software recovery methods for determining the type of hardware faults. Autom Remote Control 51(3)Google Scholar
  4. 4.
    Schagaev I (2008) Reliability of malfunction tolerance. In: International multi-conference on computer science and information technology, 2008. IMCSIT 2008, October 2008, pp 733–737Google Scholar
  5. 5.
    Schagaev I et al (2010) ERA: evolving reconfigurable architecture. In: 11th ACIS International Conference, June 2010, pp 215–220Google Scholar
  6. 6.
    Castano V, Schagaev I (2015) Resilient computer system design. Springer. ISBN 978-3-319150-68-0Google Scholar
  7. 7.
    Schagaev I (1986) Algorithms of computation recovery. Autom Remote Control 7:26, 36, 65, 122Google Scholar
  8. 8.
    Schagaev I (1987) Algorithms for restoring a computing process. Autom Remote Control 48(4):26, 65, 122, 141, 149Google Scholar
  9. 9.
    Schagaev I (1989) Instructions retry in microprocessor recovery algorithms. In: IMEKO—FTSD symposiumGoogle Scholar
  10. 10.
    Schagaev I (1990) Yet another approach to classification of redundancy. In: IBIDGoogle Scholar
  11. 11.
    Schagaev I (1986) Relationship between the formation of program recovery points and equipment reliability indices. Autom Remote Control 47Google Scholar
  12. 12.
    Kowalk W (2006) CRC cyclic redundancy check. Technical report. Universität Oldenburg Fachbereich Informatik 05.09.06Google Scholar
  13. 13.
    Hamming R (1950) Error detection and error correction codes. Bell Syst Tech J XXVI:147–160Google Scholar
  14. 14.
    Moon T (2005) Error correction coding. Wiley, New JerseyCrossRefGoogle Scholar
  15. 15.
    Schagaev I (1986, December) Using data redundancy for program rollback. Autom Remote Control 47(7), Part 2:1009–1016Google Scholar
  16. 16.
    Schagaev I., Viktorova V., Comparative analysis of the efficiency of computation-process recovery algorithms. Automation and Remote Control, 51(1), 1990Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.IT-ACS LtdStevenageUK
  2. 2.Department of InformaticsTechnopolisInnopolis, KazanRussia

Personalised recommendations