Skip to main content

A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications

  • Conference paper
  • First Online:
Formal Techniques for Safety-Critical Systems (FTSCS 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 596))

Abstract

We propose to build a fail-operational computing system from a primary self-checking controller and a secondary limp-home controller to guarantee an emergency operation in the case of hardware failure of the primary controller. A self-checking controller commonly builds on hardware-implemented fault detection, e.g. lock-stepping to reach a high diagnostic coverage of hardware faults. Such techniques come into contradiction with new features of modern CPUs such as inherent non-determinism of execution. Thus an interesting alternative to hardware-based self-checking in the primary controller is to implement software-based fault detection and recovery on the primary controller to detect and mask its hardware failures. We prove by means of stochastic model checking and prototype fault detection technique that the proposed approach not only reduces costs, but also guarantees higher availability of the computing system at the same safety level as common replicated execution on redundant hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Electric/Electronic.

  2. 2.

    FIT: Failure in Time (1/\(10^{-9}\)h).

  3. 3.

    Robust and Reliant Automotive Computing Environment for Future eCars, www.projekt-race.de/

References

  1. Beckschulze, E., et al.: Fault handling approaches on dual-core microcontrollers in safety-critical automotive applications. RWTH Aachen University, Germany, Embedded Software Laboratory (2008)

    Google Scholar 

  2. Temple, C., Vilela, A.: Fehlertolerante Systeme im Fahrzeug: von “fail-safe” zu “fail-operational”. Infineon Technologies. www.elektroniknet.de

  3. Wanner, D., et al.: Survey on fault-tolerant vehicle design. In: EVS26 International Battery, Hybrid and Fuel Cell Electric Vehicle Symposium, Los Angeles (2012)

    Google Scholar 

  4. Powel Douglass, B.: Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems. Addison-Wesley, Boston (2002)

    Google Scholar 

  5. Bernick, D., et al.: Nonstop advanced architecture. Hewlett Packard Company. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan (2005)

    Google Scholar 

  6. German Electrical and Electronic Manufacturers Assosciation (ZVEI): ConsumerComponents in Safe Automotive Applications. Position paper (2014)

    Google Scholar 

  7. Ghadhab, M., Kaienburg, J., Süßkraut, M., Fetzer, C.: Is software coded processing an answer to the execution integrity challenge of current and future software-intensive applications? In: Schulze, T., Müller, B., Meyer, G. (eds.) Advanced Microsystems for Automotive Applications 2015 Smart Systems for Green and Automated Driving. LNIM, pp. 263–275. Springer, Heidelberg (2015)

    Google Scholar 

  8. Kwiatkowska, M., Norman, G., Parker, D.: Stochastic Model Checking. School of Computer Science, University of Birmingham Edgbaston, Birmingham B15 2TT (2007)

    Google Scholar 

  9. Baier, C., et al.: Model-checking algorithms for continuous-time Markov chains. IEEE Trans. Softw. Eng. 29(7), 524–541 (2003)

    Article  Google Scholar 

  10. PRISM - Probabilistic Symbolic Model Checker. www.prismmodelchecker.org

  11. Häggström, H.: Finite Markov Chains and Algorithmic Applications. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  12. International Organization for Standardization: ISO 26262: Road vehicles - Functional safety. International standard, 1st edn. (2011)

    Google Scholar 

  13. Avizienis, A., Laprie, J.-C., Randell, B.: Fundamental concepts of dependability. Research report, no. 1145, LAAS-CNRS (2001)

    Google Scholar 

  14. Pullum, L.L.: Software Fault Tolerance Techniques and Implementation. Artech House Computing Library, Boston, London (2001)

    MATH  Google Scholar 

  15. Brown, D.T.: Error detecting and correcting binary codes for arithmetic operations. IRE Trans. Electron. Comput. 3, 333–337 (1960)

    Article  Google Scholar 

  16. Massey, J.L.: Survey of residue coding for arithmetic errors. Int. Comput. Cent. Bull. 3, 3–17 (1964)

    MathSciNet  Google Scholar 

  17. Nathan, R., Sorin, D.J.: Nostradamus: Low-cost hardware-only error detection for processor cores. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6 (2014)

    Google Scholar 

  18. Reick, K., et al.: Fault-tolerant design of the IBM Power6 microprocessor. IEEE Micro 28(2), 30–38 (2008)

    Article  Google Scholar 

  19. Forin, P.: Vital coded microprocessor principles and application for various transit systems. In: IFAC-GCCT, pp. 79–84, Paris, France (1989)

    Google Scholar 

  20. Schiffel, U.: Hardware error detection using AN-codes. Ph.D thesis, Technische Universität Dresden (2011)

    Google Scholar 

  21. Kuvaiskii, D., Fetzer, C.: \(\Delta \)-encoding: practical encoded processing. In: Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Rio de Janeiro, Brazil (2015)

    Google Scholar 

  22. Oh, N., et al.: Error detection by duplicated instructions in superscalar processors. IEEE Trans. Reliab. 51(1), 63–75 (2002)

    Article  Google Scholar 

  23. Reis, G.A., et al.: SWIFT: Software Implemented Fault Tolerance. In: Proceedings of the International Symposium on Code Generation and Optimization (2005)

    Google Scholar 

  24. Sommer, S., et al.: RACE: a centralized platform computer based architecture for automotive applications. In: Vehicular Electronics Conference (VEC) and the International Electric Vehicle Conference (IEVC) (2013)

    Google Scholar 

  25. Armbruster, M., Freitag, G., Schmid, T., Spiegelberg, G., Fiege, L., Zirkler, A.: Ethernet-based and function-independent vehicle control-platform: motivation, idea and technical concept fulfilling quantitative safety-requirements from ISO 26262. In: Meyer, G. (ed.) Advanced Microsystems for Automotive Applications 2012 Smart Systems for Safe, Sustainable and Networked Vehicles, pp. 91–107. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majdi Ghadhab .

Editor information

Editors and Affiliations

Appendix: Sensitivity analysis

Appendix: Sensitivity analysis

To understand the sensitivity of the measured properties to the failure rate and the repair rate of the computing platform, we vary one of these parameters (see Tables 3 and 4) by keeping the rest of the specification unchanged.

Table 3. Variation of failure rate \(\lambda \).
Table 4. Variation of repair rate \(\mu \).

1.1 Part 1 - Sensitivity of the “intact”-probability to the parameters failure rate and repair rate (platform 1 vs. 2)

The improvement reached by platform 2 compared to platform 1 regarding the probability of the state “intact” is more significant at high failure rates (Fig. 20) and low repair rates (Fig. 21). At low failure rates or high repair rates, the probability of the state “intact” is almost identical between platform 1 and platform 2.

Fig. 20.
figure 20

Impact of the failure rate of the primary controller on the probability of the state “intact.

Fig. 21.
figure 21

Impact of the repair rate on the probability of the state “intact’.

1.2 Part 2 - Sensitivity of the availability to the parameters failure rate and repair rate (platform 2 vs. 3)

The improvement reached by platform 3 compared to platform 2 regarding the availability of the computing platform is almost independent from the failure rate of the primary controller. The improvement is actually negligible at high as well as at low failure rates (Fig. 22). However, Fig. 23 shows a clear availability improvement at low repair rates.

Fig. 22.
figure 22

Impact of the failure rate of the primary controller on the availability of the computing platform.

Fig. 23.
figure 23

Impact of the repair rate on the availability of the computing platform.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ghadhab, M., Kuntz, M., Kuvaiskii, D., Fetzer, C. (2016). A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications. In: Artho, C., Ölveczky, P. (eds) Formal Techniques for Safety-Critical Systems. FTSCS 2015. Communications in Computer and Information Science, vol 596. Springer, Cham. https://doi.org/10.1007/978-3-319-29510-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29510-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29509-1

  • Online ISBN: 978-3-319-29510-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics