Recovery: Searching and Monitoring of Correct Software States

Schagaev, Igor; Zouev, Eugene; Thomas, Kaegi

doi:10.1007/978-3-030-21244-5_9

Igor Schagaev⁴,
Eugene Zouev⁵ &
Kaegi Thomas⁴

638 Accesses

Abstract

The last of the three GAFT processes is called recovery and recovery monitoring. After the detection of an error and possible reconfiguration, the last step is recovering the software, which means that the effect of the error on the software must be eliminated. In line with the previous chapters and [1,2,3,4,5,6], the recovery consists of restoring the last recovery point and continuing the processing. But is this really sufficient? What if latent faults exist in the system and manifest themselves in the system but trigger some detection schemes an arbitrary time later? Assuming this reasonable and unpleasant sequence of events, it becomes clear that just restoring data and program from the last stored recovery point is not enough. We have to admit that we do not have any guarantee that fault is now eliminated: even when hardware is restored or even reconfigured—we have erroneous states of software recorded in recovery points. Thus, we have to consider the recovery process itself and analyze which classic algorithms are applicable and fit the purpose of efficient recovery. We introduce and analyze three recovery algorithms that are able to ensure successful recovery by iteratively go through all stored recovery points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sogomonian E, Schagaev I (1988) Hardware and software fault tolerance of computer systems. Avtom I Telemekhanika, 3–39
Google Scholar
Schagaev I (1989) Computing process recovery algorithms. Avtomat Telemekh (4)
Google Scholar
Schagaev I (1990) Using software recovery methods for determining the type of hardware faults. Autom Remote Control 51(3)
Google Scholar
Schagaev I (2008) Reliability of malfunction tolerance. In: International multi-conference on computer science and information technology, 2008. IMCSIT 2008, October 2008, pp 733–737
Google Scholar
Schagaev I et al (2010) ERA: evolving reconfigurable architecture. In: 11th ACIS International Conference, June 2010, pp 215–220
Google Scholar
Castano V, Schagaev I (2015) Resilient computer system design. Springer. ISBN 978-3-319150-68-0
Google Scholar
Schagaev I (1986) Algorithms of computation recovery. Autom Remote Control 7:26, 36, 65, 122
Google Scholar
Schagaev I (1987) Algorithms for restoring a computing process. Autom Remote Control 48(4):26, 65, 122, 141, 149
Google Scholar
Schagaev I (1989) Instructions retry in microprocessor recovery algorithms. In: IMEKO—FTSD symposium
Google Scholar
Schagaev I (1990) Yet another approach to classification of redundancy. In: IBID
Google Scholar
Schagaev I (1986) Relationship between the formation of program recovery points and equipment reliability indices. Autom Remote Control 47
Google Scholar
Kowalk W (2006) CRC cyclic redundancy check. Technical report. Universität Oldenburg Fachbereich Informatik 05.09.06
Google Scholar
Hamming R (1950) Error detection and error correction codes. Bell Syst Tech J XXVI:147–160
Google Scholar
Moon T (2005) Error correction coding. Wiley, New Jersey
Book Google Scholar
Schagaev I (1986, December) Using data redundancy for program rollback. Autom Remote Control 47(7), Part 2:1009–1016
Google Scholar
Schagaev I., Viktorova V., Comparative analysis of the efficiency of computation-process recovery algorithms. Automation and Remote Control, 51(1), 1990
Google Scholar

Download references

Author information

Authors and Affiliations

IT-ACS Ltd, Stevenage, UK
Igor Schagaev & Kaegi Thomas
Department of Informatics, Technopolis, Innopolis, Kazan, Russia
Eugene Zouev

Authors

Igor Schagaev
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Zouev
View author publications
You can also search for this author in PubMed Google Scholar
Kaegi Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Schagaev .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schagaev, I., Zouev, E., Thomas, K. (2020). Recovery: Searching and Monitoring of Correct Software States. In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-21244-5_9
Published: 10 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21243-8
Online ISBN: 978-3-030-21244-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics