Testing, Checking, and Hardware Syndrome

Schagaev, Igor; Zouev, Eugene; Thomas, Kaegi

doi:10.1007/978-3-030-21244-5_7

Testing, Checking, and Hardware Syndrome

Igor Schagaev⁴,
Eugene Zouev⁵ &
Kaegi Thomas⁴

Chapter
First Online: 10 July 2019

639 Accesses

Abstract

In previous chapters, we introduced the processes of checking and testing, the first of the three main processes of Generalized Algorithm of Fault Tolerance—GAFT. In this chapter, we further discuss the process of checking hardware, at first software-based hardware-checking and at second hardware-based checking. For the software-based hardware checking, we show what a software-based test should include, when they are the preferred choice over hardware-based checking schemes, and especially how such tests can be scheduled in the system without interfering with ongoing real-time tasks. Further to support handling of hardware-based checking, we introduce a new system condition descriptor—so-called a syndrome, and illustrate how it can be used as a mechanism to signal to the operating system the hardware condition, including manifestation of detected error. We then show the steps the runtime system performs to eliminate the fault and in case of permanent errors how the software can reconfigure the hardware to exclude the faulty element. We also explain in which cases software has to adapt to the new hardware topology. We start by explaining how software-based checks can be used to detect hardware faults. Runtime systems use online or offline scheduling mechanisms for task management of programs—own—system software ones and user application ones. Since [1,2,3,4] it is expected that runtime system provides a special session of tasks scheduling (offline or online during execution) for the purposes of diagnostic of hardware conditions—recall Apple and Microsoft system starting delays. Later for some systems that operate in domain of real-time monitoring scheduling of tasks, critical in time of execution especially criticality of hardware availability and efficiency of process scheduling become crucial. In turn, testing itself becomes “hot” in terms of required time and coverage of hardware. Thus in this chapter, we initially analyze simple sequences of testing of hardware elements of computer systems. Further, we introduce a concept of transparent for user application procedure of hardware testing. This enables to prove integrity of computer system hardware, and guarantee it within a reasonable time, without delay of service of execution of user tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kirby W et al (1985) The NMFECC cray time-sharing system. Softw Pract Exper 15(1):87–103
Article Google Scholar
Serlin O (1984) Fault-tolerant systems in commercial applications. Computer C 7(8):19–30
Google Scholar
Blazewicz J et al (2007) Handbook on scheduling, from theory to applications. Springer, Berlin, Heidelberg
MATH Google Scholar
Ingo M (2002) Linux kernel archive. World Wide Web electronic publication, January 03, 2002
Google Scholar
Bogdanov J, Schagaev I (1990) Sliding slotting diagnosis in multiprocessors. In: IMECO congress proceedings, pp 141–150
Google Scholar
Garey M, Johnson D (1979) Computers and in-tractability: a guide to the theory of NP-completeness. W.H. Freeman and Company
Google Scholar
Knuth D (1998) The art of computer programming 3. Sorting and searching, vol III. Addison-Wesley Longman, Amsterdam
MATH Google Scholar
Johannes M (2002) The active object system-design and multiprocessor implementation. ETH Zurich, Zurich
Google Scholar
Liu CL, Layland J (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61
Article MathSciNet Google Scholar
Castano V, Schagaev I (2014) Resilient computer system design. Springer. ISBN 978-3-319-15069-7
Google Scholar
Blaeser L, Monkman S, Schagaev I (2014) Evolving systems Worldcomp 2014. In: Proceedings of the international conference on foundations of computer science FCS’14, 2014. CSREA Press. ISBN 1-60132-270-4
Google Scholar
Monkman S, Schagaev I (2013) Redundancy + Reconfigurability = Recoverability Electronics, 2, pp 212–233. ISSN 2079-9292. https://doi.org/10.3390/electronics2030212
Article Google Scholar
Buhanova G, Schagaev I (2001) Comparative study of fault tolerant RAM structures. In: Proceedings of IEEE dependendable system networks Conference, Guetebog, July 10, 2001. https://www.academia.edu/7140850/

Download references

Author information

Authors and Affiliations

IT-ACS Ltd, Stevenage, UK
Igor Schagaev & Kaegi Thomas
Department of Informatics, Technopolis, Innopolis, Kazan, Russia
Eugene Zouev

Authors

Igor Schagaev
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Zouev
View author publications
You can also search for this author in PubMed Google Scholar
Kaegi Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Schagaev .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schagaev, I., Zouev, E., Thomas, K. (2020). Testing, Checking, and Hardware Syndrome. In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-21244-5_7
Published: 10 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21243-8
Online ISBN: 978-3-030-21244-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics