Advertisement

A fast and accurate hybrid fault injection platform for transient and permanent faults

  • Anderson L. SartorEmail author
  • Pedro H. E. Becker
  • Antonio C. S. Beck
Article
  • 87 Downloads

Abstract

Many ground-level and space systems require reliability testing before their deployment, since they are increasingly susceptible to transient and permanent faults. Such process must be accurate, controllable, generic, cheap, and fast. Even though fault injection at gate-level is often the most appropriate solution when one seeks for accuracy and controllability, it is very time-consuming. Considering that, this work proposes a hybrid fault injection framework that automatically switches between RTL and gate-level simulation modes. By using a complex 8-issue VLIW processor as case-study, we show that the injection process can be accelerated by more than \(10\times \) for transient faults and almost 2 times for permanent faults over conventional injectors, while maintaining gate-level accuracy and controllability. The proposed framework is generic, so that faults can be injected into any arbitrary circuit, which is demonstrated by also injecting faults in a neural network and achieving a speedup of more than \(30\times \).

Keywords

Fault injection Reliability RTL simulation Gate-level simulation Soft errors Permanent faults 

Notes

Acknowledgements

This study was financed in part by: Pronex 16/0472-2; and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

References

  1. 1.
    Beck ACS, Lisbôa CAL, Carro L (2012) Adaptable embedded systems. Springer, HeidelbergGoogle Scholar
  2. 2.
    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. SIGARCH Comput Archit News 39(2):1–7.  https://doi.org/10.1145/2024716.2024718 CrossRefGoogle Scholar
  3. 3.
    Binkert NL, Dreslinski RG, Hsu LR, Lim KT, Saidi AG, Reinhardt SK (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60.  https://doi.org/10.1109/MM.2006.82 CrossRefGoogle Scholar
  4. 4.
    Bolchini C, Sandionigi C (2010) Fault classification for SRAM-based FPGAs in the space environment for fault mitigation. IEEE Embed Syst Lett 2(4):107–110CrossRefGoogle Scholar
  5. 5.
    Cho H, Cher CY, Shepherd T, Mitra S (2015) Understanding soft errors in uncore components. In: Proceedings of the 52nd annual design automation conference, DAC, pp 89:1–89:6. ACM, New York, NY, USA.  https://doi.org/10.1145/2744769.2744923
  6. 6.
    Cho H, Mirkhani S, Cher CY, Abraham JA, Mitra S (2013) Quantitative evaluation of soft error injection techniques for robust system design. In: 50th ACM/EDAC/IEEE design automation conference (DAC), pp 1–10Google Scholar
  7. 7.
    Ejlali A, Miremadi SG, Zarandi H, Asadi G, Sarmadi SB (2003) A hybrid fault injection approach based on simulation and emulation co-operation. In: Dependable systems and networks. Proceedings international conference on, pp 479–488.  https://doi.org/10.1109/DSN.2003.1209958
  8. 8.
    Erichsen AG, Sartor AL, Souza JD, Pereira MM, Wong S, Beck ACS (2018) ISA-DTMR: selective protection in configurable heterogeneous multicores. In: Voros N, Huebner M, Keramidas G, Goehringer D, Antonopoulos C, Diniz PC (eds) Applied reconfigurable computing. Architectures, tools, and applications. Springer International Publishing, Cham, pp 231–242CrossRefGoogle Scholar
  9. 9.
    Goswami KK (1997) DEPEND: a simulation-based environment for system level dependability analysis. IEEE Trans Comput 46(1):60–74.  https://doi.org/10.1109/12.559803 CrossRefGoogle Scholar
  10. 10.
    Gustafsson J, Betts A, Ermedahl A, Lisper B (2010) The Malardalen WCET benchmarks: past, present and future. WCET 15:136–146Google Scholar
  11. 11.
    Hari SKS, Adve SV, Naeimi H, Ramachandran P (2012) Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. SIGPLAN Not 47(4):123–134.  https://doi.org/10.1145/2248487.2150990 CrossRefGoogle Scholar
  12. 12.
    Hauser J (2002) Berkeley SoftFloat. http://www.jhauser.us/arithmetic/SoftFloat.html
  13. 13.
    Hsueh MC, Tsai TK, Iyer RK (1997) Fault injection techniques and tools. Computer 30(4):75–82CrossRefGoogle Scholar
  14. 14.
    Kalbarczyk Z, Iyer RK, Ries GL, Patel JU, Lee MS, Xiao Y (1999) Hierarchical simulation approach to accurate fault modeling for system dependability evaluation. IEEE Trans Softw Eng 25(5):619–632.  https://doi.org/10.1109/32.815322 CrossRefGoogle Scholar
  15. 15.
    Kaliorakis M, Tselonis S, Chatzidimitriou A, Gizopoulos D (2015) Accelerated microarchitectural fault injection-based reliability assessment. In: IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp 47–52.  https://doi.org/10.1109/DFT.2015.7315134
  16. 16.
    Kobayashi H, Usuki H, Shiraishi K, Tsuchiya H, Kawamoto N, Merchant G, Kase J (2004) Comparison between neutron-induced system-SER and accelerated-SER in SRAMs. In: Reliability physics symposium, 42nd annual IEEE international, pp 288–293. IEEEGoogle Scholar
  17. 17.
    Kooli M, Natale GD, Bosio A (2016) Cache-aware reliability evaluation through LLVM-based analysis and fault injection. In: IEEE 22nd international symposium on on-line testing and robust system design (IOLTS), pp 19–22.  https://doi.org/10.1109/IOLTS.2016.7604663
  18. 18.
    Lesea A, Drimer S, Fabula JJ, Carmichael C, Alfke P (2005) The rosetta experiment: atmospheric soft error rate testing in differing technology FPGAs. IEEE Trans Device Mater Reliab 5(3):317–328CrossRefGoogle Scholar
  19. 19.
    Li ML, Ramachandran P, Karpuzcu UR, Hari SKS, Adve SV (2009) Accurate microarchitecture-level fault modeling for studying hardware faults. In: IEEE 15th international symposium on high performance computer architecture, pp 105–116.  https://doi.org/10.1109/HPCA.2009.4798242
  20. 20.
    Libano F, Rech P, Tambara L, Tonfat J, Kastensmidt F (2018) On the reliability of linear regression and pattern recognition feedforward artificial neural networks in FPGAs. IEEE Trans Nucl Sci 65(1):288–295.  https://doi.org/10.1109/TNS.2017.2784367 CrossRefGoogle Scholar
  21. 21.
    Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. Computer 35(2):50–58.  https://doi.org/10.1109/2.982916 CrossRefGoogle Scholar
  22. 22.
    Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven mMultiprocessor simulator (GEMS) toolset. SIGARCH Comput Archit News 33(4):92–99.  https://doi.org/10.1145/1105734.1105747 CrossRefGoogle Scholar
  23. 23.
    Mukherjee SS, Weaver C, Emer J, Reinhardt SK, Austin T (2003) A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: Microarchitecture, 36th annual IEEE/ACM international symposium on, p 29. IEEE Computer SocietyGoogle Scholar
  24. 24.
    Parasyris K, Tziantzoulis G, Antonopoulos CD, Bellas N (2014) GemFI: a fault injection tool for studying the behavior of applications on unreliable substrates. In: Dependable systems and networks (DSN), 44th annual IEEE/IFIP international conference on, pp 622–629. IEEEGoogle Scholar
  25. 25.
    Patel A, Afram F, Chen S, Ghose K (2011) MARSS: a full system simulator for multicore x86 CPUs. In: Proceedings of the 48th design automation conference, DAC ’11, pp. 1050–1055. ACM, New York, NY, USA.  https://doi.org/10.1145/2024724.2024954
  26. 26.
    Ramachandran P, Kudva P, Kellington J, Schumann J, Sanda P (2008) Statistical fault injection. In: IEEE international conference on dependable systems and networks with FTCS and DCC (DSN), pp. 122–127.  https://doi.org/10.1109/DSN.2008.4630080
  27. 27.
    Sartor AL, Becker PHE, Beck ACS (2017) Simbah-FI: simulation-based hybrid fault injector. In: VII Brazilian symposium on computing systems engineering (SBESC), pp 94–101Google Scholar
  28. 28.
    Sartor AL, Becker PHE, Hoozemans J, Wong S, Beck ACS (2018) Dynamic trade-off among fault tolerance, energy consumption, and performance on a multiple-issue VLIW processor. IEEE Trans Multi-Scale Comput Syst 4(3):327–339.  https://doi.org/10.1109/TMSCS.2017.2760299 CrossRefGoogle Scholar
  29. 29.
    Sartor AL, Lorenzon AF, Carro L, Kastensmidt F, Wong S, Beck A (2015) A novel phase-based low overhead fault tolerance approach for VLIW processors. In: VLSI (ISVLSI), IEEE Computer Society annual symposium on, pp 485–490. IEEEGoogle Scholar
  30. 30.
    Sartor AL, Lorenzon AF, Carro L, Kastensmidt F, Wong S, Beck ACS (2017) Exploiting idle hardware to provide low overhead fault tolerance for VLIW processors. ACM J Emerg Technol Comput Syst 13(2):13:1–13:21.  https://doi.org/10.1145/3001935 CrossRefGoogle Scholar
  31. 31.
    Sartor AL, Lorenzon AF, Kundu S, Koren I, Beck ACS (2018) Adaptive and polymorphic VLIW processor to optimize fault tolerance, energy consumption, and performance. In: ACM international conference on computing frontiers, pp 54–61. ACM.  https://doi.org/10.1145/3203217.3203238
  32. 32.
    Sartor AL, Wong S, Beck ACS (2016) Adaptive ILP control to increase fault tolerance for VLIW processors. In: IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 9–16.  https://doi.org/10.1109/ASAP.2016.7760767
  33. 33.
    Scott J, Lee LH, Arends J, Moyer B (1998) Designing the low-power MCORE architecture. In: Power driven microarchitecture workshop, pp 145–150Google Scholar
  34. 34.
    Shivakumar P, Kistler M, Keckler S, Burger D, Alvisi L (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. In: Dependable systems and networks (DSN), International conf. on pp 389–398Google Scholar
  35. 35.
    Violante M, Sterpone L, Manuzzato A, Gerardin S, Rech P, Bagatin M, Paccagnella A, Andreani C, Gorini G, Pietropaolo A (2007) Others: a new hardware/software platform and a new 1/E neutron source for soft error studies: testing FPGAs at the ISIS facility. IEEE Trans Nucl Sci 54(4):1184–1189CrossRefGoogle Scholar
  36. 36.
    Wind River: Simics - Supported Targets (2017). http://www.windriver.com/products/simics/simics-supported-targets.html
  37. 37.
    Wong S, Van As T, Brown G (2008) \(\rho \)-VEX: a reconfigurable and extensible softcore VLIW processor. In: International conference on ICECE technology, pp 369–372. IEEEGoogle Scholar
  38. 38.
    Yahagi Y, Saito Y, Terunuma K, Nunomiya T, Nakamura T (2002) Self-consistent integrated system for susceptibility to terrestrial neutron induced soft-error of sub-quarter micron memory devices. In: Integrated reliability workshop, IEEE international, pp 143–146. IEEEGoogle Scholar
  39. 39.
    Yalcin G, Unsal OS, Cristal A, Valero M (2011) FIMSIM: a fault injection infrastructure for microarchitectural simulators. In: IEEE 29th international conference on computer design (ICCD), pp 431–432.  https://doi.org/10.1109/ICCD.2011.6081435

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of InformaticsUniversidade Federal do Rio Grande do Sul (UFRGS) Porto AlegreBrazil

Personalised recommendations