Abstract
We propose to build a fail-operational computing system from a primary self-checking controller and a secondary limp-home controller to guarantee an emergency operation in the case of hardware failure of the primary controller. A self-checking controller commonly builds on hardware-implemented fault detection, e.g. lock-stepping to reach a high diagnostic coverage of hardware faults. Such techniques come into contradiction with new features of modern CPUs such as inherent non-determinism of execution. Thus an interesting alternative to hardware-based self-checking in the primary controller is to implement software-based fault detection and recovery on the primary controller to detect and mask its hardware failures. We prove by means of stochastic model checking and prototype fault detection technique that the proposed approach not only reduces costs, but also guarantees higher availability of the computing system at the same safety level as common replicated execution on redundant hardware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Electric/Electronic.
- 2.
FIT: Failure in Time (1/\(10^{-9}\)h).
- 3.
Robust and Reliant Automotive Computing Environment for Future eCars, www.projekt-race.de/
References
Beckschulze, E., et al.: Fault handling approaches on dual-core microcontrollers in safety-critical automotive applications. RWTH Aachen University, Germany, Embedded Software Laboratory (2008)
Temple, C., Vilela, A.: Fehlertolerante Systeme im Fahrzeug: von “fail-safe” zu “fail-operational”. Infineon Technologies. www.elektroniknet.de
Wanner, D., et al.: Survey on fault-tolerant vehicle design. In: EVS26 International Battery, Hybrid and Fuel Cell Electric Vehicle Symposium, Los Angeles (2012)
Powel Douglass, B.: Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems. Addison-Wesley, Boston (2002)
Bernick, D., et al.: Nonstop advanced architecture. Hewlett Packard Company. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan (2005)
German Electrical and Electronic Manufacturers Assosciation (ZVEI): ConsumerComponents in Safe Automotive Applications. Position paper (2014)
Ghadhab, M., Kaienburg, J., Süßkraut, M., Fetzer, C.: Is software coded processing an answer to the execution integrity challenge of current and future software-intensive applications? In: Schulze, T., Müller, B., Meyer, G. (eds.) Advanced Microsystems for Automotive Applications 2015 Smart Systems for Green and Automated Driving. LNIM, pp. 263–275. Springer, Heidelberg (2015)
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic Model Checking. School of Computer Science, University of Birmingham Edgbaston, Birmingham B15 2TT (2007)
Baier, C., et al.: Model-checking algorithms for continuous-time Markov chains. IEEE Trans. Softw. Eng. 29(7), 524–541 (2003)
PRISM - Probabilistic Symbolic Model Checker. www.prismmodelchecker.org
Häggström, H.: Finite Markov Chains and Algorithmic Applications. Cambridge University Press, Cambridge (2002)
International Organization for Standardization: ISO 26262: Road vehicles - Functional safety. International standard, 1st edn. (2011)
Avizienis, A., Laprie, J.-C., Randell, B.: Fundamental concepts of dependability. Research report, no. 1145, LAAS-CNRS (2001)
Pullum, L.L.: Software Fault Tolerance Techniques and Implementation. Artech House Computing Library, Boston, London (2001)
Brown, D.T.: Error detecting and correcting binary codes for arithmetic operations. IRE Trans. Electron. Comput. 3, 333–337 (1960)
Massey, J.L.: Survey of residue coding for arithmetic errors. Int. Comput. Cent. Bull. 3, 3–17 (1964)
Nathan, R., Sorin, D.J.: Nostradamus: Low-cost hardware-only error detection for processor cores. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6 (2014)
Reick, K., et al.: Fault-tolerant design of the IBM Power6 microprocessor. IEEE Micro 28(2), 30–38 (2008)
Forin, P.: Vital coded microprocessor principles and application for various transit systems. In: IFAC-GCCT, pp. 79–84, Paris, France (1989)
Schiffel, U.: Hardware error detection using AN-codes. Ph.D thesis, Technische Universität Dresden (2011)
Kuvaiskii, D., Fetzer, C.: \(\Delta \)-encoding: practical encoded processing. In: Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Rio de Janeiro, Brazil (2015)
Oh, N., et al.: Error detection by duplicated instructions in superscalar processors. IEEE Trans. Reliab. 51(1), 63–75 (2002)
Reis, G.A., et al.: SWIFT: Software Implemented Fault Tolerance. In: Proceedings of the International Symposium on Code Generation and Optimization (2005)
Sommer, S., et al.: RACE: a centralized platform computer based architecture for automotive applications. In: Vehicular Electronics Conference (VEC) and the International Electric Vehicle Conference (IEVC) (2013)
Armbruster, M., Freitag, G., Schmid, T., Spiegelberg, G., Fiege, L., Zirkler, A.: Ethernet-based and function-independent vehicle control-platform: motivation, idea and technical concept fulfilling quantitative safety-requirements from ISO 26262. In: Meyer, G. (ed.) Advanced Microsystems for Automotive Applications 2012 Smart Systems for Safe, Sustainable and Networked Vehicles, pp. 91–107. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Sensitivity analysis
Appendix: Sensitivity analysis
To understand the sensitivity of the measured properties to the failure rate and the repair rate of the computing platform, we vary one of these parameters (see Tables 3 and 4) by keeping the rest of the specification unchanged.
1.1 Part 1 - Sensitivity of the “intact”-probability to the parameters failure rate and repair rate (platform 1 vs. 2)
The improvement reached by platform 2 compared to platform 1 regarding the probability of the state “intact” is more significant at high failure rates (Fig. 20) and low repair rates (Fig. 21). At low failure rates or high repair rates, the probability of the state “intact” is almost identical between platform 1 and platform 2.
1.2 Part 2 - Sensitivity of the availability to the parameters failure rate and repair rate (platform 2 vs. 3)
The improvement reached by platform 3 compared to platform 2 regarding the availability of the computing platform is almost independent from the failure rate of the primary controller. The improvement is actually negligible at high as well as at low failure rates (Fig. 22). However, Fig. 23 shows a clear availability improvement at low repair rates.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ghadhab, M., Kuntz, M., Kuvaiskii, D., Fetzer, C. (2016). A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications. In: Artho, C., Ölveczky, P. (eds) Formal Techniques for Safety-Critical Systems. FTSCS 2015. Communications in Computer and Information Science, vol 596. Springer, Cham. https://doi.org/10.1007/978-3-319-29510-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-29510-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29509-1
Online ISBN: 978-3-319-29510-7
eBook Packages: Computer ScienceComputer Science (R0)