Advertisement

S\(^{3}\)DES - Scalable Software Support for Dependable Embedded Systems

  • Lukas OsinskiEmail author
  • Jürgen Mottok
Conference paper
  • 534 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11479)

Abstract

Scalable Software Support for Dependable Embedded Systems (S\(^{3}\)DES) achieves fault tolerance by utilizing spatial software-based triple modular redundancy for computational and voter processes on application level. Due to the parallel execution of the replicas on distinct CPU cores it makes a step towards software-based fault tolerance against transient and permanent random hardware errors. Additionally, the compliance with real-time requirements in terms of response time is enhanced compared to similar approaches. The replicated voters, the introduced mutual voter monitoring and the optimized arithmetic encoding allow the detection and compensation of voter failures without the utilization of backward recovery. Fault injection experiments on real hardware reveal that S\(^{3}\)DES can detect and mask all injected data and program flow errors under a single fault assumption, whereas an uncoded voting scheme yields approx. 12% silent data corruptions in a similar experiment.

Keywords

Fault tolerance Multi-core Arithmetic encoding Triple modular redundancy Replicated voting 

References

  1. 1.
    Arlat, J., et al.: Fault injection for dependability validation: a methodology and some applications. IEEE Trans. Softw. Eng. 16(2), 166–182 (1990)CrossRefGoogle Scholar
  2. 2.
    Bartlett, W., Spainhower, L.: Commercial fault tolerance: a tale of two systems. IEEE Trans. Dependable Secure Comput. 1(1), 87–96 (2004).  https://doi.org/10.1109/TDSC.2004.4CrossRefGoogle Scholar
  3. 3.
    Baumann, R.: Soft errors in advanced computer systems. IEEE Des. Test Comput. 22(3), 258–266 (2005).  https://doi.org/10.1109/MDT.2005.69CrossRefGoogle Scholar
  4. 4.
    Borkar, S.: Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6), 10–16 (2005).  https://doi.org/10.1109/MM.2005.110CrossRefGoogle Scholar
  5. 5.
    Braun, J., Mottok, J.: Fail-safe and fail-operational systems safeguarded with coded processing. In: Eurocon 2013, pp. 1878–1885, July 2013.  https://doi.org/10.1109/EUROCON.2013.6625234
  6. 6.
    Braun, J., Mottok, J.: The myths of coded processing. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 1637–1644, August 2015.  https://doi.org/10.1109/HPCC-CSS-ICESS.2015.24
  7. 7.
    Echtle, K.: Fehlertoleranzverfahren (1990)Google Scholar
  8. 8.
    Goloubeva, O., Rebaudengo, M., Reorda, M.S., Violante, M.: Software Implemented Hardware Fault Tolerance, vol. 2005. Springer, New York (2006)zbMATHGoogle Scholar
  9. 9.
    Hsueh, M.C., Tsai, T.K., Iyer, R.K.: Fault injection techniques and tools. Computer 30(4), 75–82 (1997).  https://doi.org/10.1109/2.585157CrossRefGoogle Scholar
  10. 10.
    Kim, Y., et al.: Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014.  https://doi.org/10.1109/ISCA.2014.6853210
  11. 11.
    Koser, E., Berthold, K., Pujari, R.K., Stechele, W.: A chip-level redundant threading (CRT) scheme for shared-memory protection. In: 2016 International Conference on High Performance Computing Simulation (HPCS), pp. 116–124, July 2016.  https://doi.org/10.1109/HPCSim.2016.7568324
  12. 12.
    Li, M.L., Ramachandran, P., Sahoo, S.K., Adve, S.V., Adve, V.S., Zhou, Y.: Understanding the propagation of hard errors to software and implications for resilient system design. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XIII, pp. 265–276. ACM, New York (2008).  https://doi.org/10.1145/1346281.1346315
  13. 13.
    Maia, R., Henriques, L., Costa, D., Madeira, H.: XceptionTM - enhanced automated fault-injection environment. In: Proceedings International Conference on Dependable Systems and Networks, p. 547 (2002).  https://doi.org/10.1109/DSN.2002.1028978
  14. 14.
    Narayanan, V., Xie, Y.: Reliability concerns in embedded system designs. Computer 39(1), 118–120 (2006).  https://doi.org/10.1109/MC.2006.31CrossRefGoogle Scholar
  15. 15.
    Nightingale, E.B., Douceur, J.R., Orgovan, V.: Cycles, cells and platters: an empirical analysis of hardware failures on a million consumer PCs. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys 2011, pp. 343–356. ACM, New York (2011).  https://doi.org/10.1145/1966445.1966477
  16. 16.
    Osinski, L., Langer, T., Schmid, M., Mottok, J.: PyFI-fault injection platform for real hardware. In: ARCS Workshop 2018; 31st International Conference on Architecture of Computing Systems, pp. 1–7. VDE (2018)Google Scholar
  17. 17.
    Reis, G.A., Chang, J., August, D.I.: Automatic Instruction-Level Software-Only Recovery, pp. 36–47 (2007)Google Scholar
  18. 18.
    Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I.: SWIFT: software implemented fault tolerance. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 243–254. IEEE Computer Society, Washington, DC (2005).  https://doi.org/10.1109/CGO.2005.34
  19. 19.
    Saggese, G.P., Wang, N.J., Kalbarczyk, Z.T., Patel, S.J., Iyer, R.K.: An experimental study of soft errors in microprocessors. IEEE Micro 25(6), 30–39 (2005).  https://doi.org/10.1109/MM.2005.104CrossRefGoogle Scholar
  20. 20.
    Schiffel, U.: Hardware error detection using AN-codes (2010)Google Scholar
  21. 21.
    Schroeder, B., Pinheiro, E., Weber, W.D.: DRAM errors in the wild: a large-scale field study. In: ACM SIGMETRICS Performance Evaluation Review, vol. 37, pp. 193–204. ACM (2009)Google Scholar
  22. 22.
    Shye, A., Moseley, T., Reddi, V.J., Blomstedt, J., Connors, D.A.: Using process-level redundancy to exploit multiple cores for transient fault tolerance. In: 2007 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2007, pp. 297–306. IEEE (2007)Google Scholar
  23. 23.
    Stott, D.T., Floering, B., Burke, D., Kalbarczpk, Z., Iyer, R.K.: NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors. In: Proceedings IEEE International Computer Performance and Dependability Symposium, IPDS 2000. pp. 91–100 (2000).  https://doi.org/10.1109/IPDS.2000.839467
  24. 24.
    Ulbrich, P.: Ganzheitliche Fehlertoleranz in Eingebetteten Softwaresystemen. Ph.D. thesis (2014)Google Scholar
  25. 25.
    Wappler, U., Fetzer, C.: Hardware failure virtualization via software encoded processing. In: 2007 5th IEEE International Conference on Industrial Informatics, vol. 2, pp. 977–982, June 2007.  https://doi.org/10.1109/INDIN.2007.4384907
  26. 26.
    Wappler, U., Muller, M.: Software protection mechanisms for dependable systems. In: 2008 Design, Automation and Test in Europe, pp. 947–952, March 2008.  https://doi.org/10.1109/DATE.2008.4484802
  27. 27.
    Ziade, H., Ayoubi, R., Velazco, R.: A survey on fault injection techniques. Int. Arab J. Inf. Technol. 1(2), 171–186 (2004). https://doi.org/10.1.1.167.966

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Laboratory for Safe and Secure Systems (LaS³)Technical University of Applied Sciences RegensburgRegensburgGermany

Personalised recommendations