Advertisement

Assessing Error Detection Coverage by Simulated Fault Injection

  • Cristian Constantinescu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1667)

Abstract

Server dependability is of increasing importance as more critical applications rely on the client-server computing model. As a consequence, complex fault/error handling mechanisms are becoming common features of today servers. This paper presents a new simulated fault injection method, which allows the assessment of the effectiveness of error detection mechanisms without using expensive test circuits. Fault injection was performed in two stages. First, physical fault injection was performed on a prototype server. Transient faults were injected in randomly selected signals. Traces of the signals sensitive to transients were captured. A complex protocol checker was devised for increasing error detection. The new detection circuitry was simulated in the second stage of the experiment. Signal traces, injected with transient faults, were used as inputs of the simulation. The error detection coverage and latency were derived. Fault injection also showed that coverage probability was a function of fault duration.

Keywords

Error Detection Coverage Probability Protocol Violation Fault Injection Transient Fault 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arlat, J. et al: Fault Injection for Dependability Validation: a Methodology and Some Applications. IEEE Trans. Software Engineering, 2 (1990) 166–182CrossRefGoogle Scholar
  2. 2.
    Barton, J., Czeck, E., Segall, Z., Siewiorek, D.: Fault Injection Experiments using FIAT. IEEE Trans. Computers, 4 (1990) 575–582CrossRefGoogle Scholar
  3. 3.
    Carreira, J., Madeira, H., Silva, J. G.: Xception: A technique for the experimental evaluation of dependability in modern computers. IEEE Trans. Soft. Engineering, 2 (1998) 125–136CrossRefGoogle Scholar
  4. 4.
    Chillarege, R., Bowen, N.: Understanding Large Systems Failures-A fault Injection Experiment., Proc. 19th FTCS Symposium (1989) 356–363Google Scholar
  5. 5.
    Constantinescu, C.: Validation of the fault/error handling mechanisms of the Teraflops supercomputer. Proc. 28th FTCS Symposium (1998) 382–389Google Scholar
  6. 6.
    Constantinescu, C.: Estimation of coverage probabilities for dependability validation of fault-tolerant computing systems. Proc. 9th Annual Conf. Computer Assurance, Gaithersburg (1994) 101–106Google Scholar
  7. 7.
    Constantinescu, C.: Using Multi-stage & Stratified Sampling for Inferring Fault Coverage Probabilities. IEEE Trans. Reliability, 4 (1995) 632–639CrossRefGoogle Scholar
  8. 8.
    Echtle, K., Leu, M.: The EFA Fault Injector for Fault Tolerant Distributed System Testing. Proc. Fault Tolerant Parallel and Distributed Systems Workshop, (1992) 28–35Google Scholar
  9. 9.
    Folkesson, P., Svensson, S., Karlsson, J.: A comparison of simulated based and scan chain implemented fault injection. Proc. 28th FTCS Symposium (1998) 284–293Google Scholar
  10. 10.
    Ghosh, A., Johnson, B.: System-Level Modeling in the ADEPT Environment of a Distributed Computer System for Real-Time Applications. Proc. IEEE International Computer Performance and Dependability Symposium (1995) 194–203Google Scholar
  11. 11.
    Goswami, K., Iyer, R.K., Young, L.: DEPEND: A Simulation Based Environment for System Level Dependability Analysis. IEEE Trans. Computers, 1 (1997) 60–74CrossRefGoogle Scholar
  12. 12.
    Han, S., Shin, K., Rosenberg, H.: DOCTOR: An Integrated Software Fault Injection Environment for Distributed Real-Time Systems. Proc. IEEE International Computer Performance and Dependability Symposium. (1995) 204–213Google Scholar
  13. 13.
    Hsueh, M.C., Tsai, T. K., Iyer, R. K.: Fault injection techniques and tools. IEEE Computer, 4 (1997) 75–82Google Scholar
  14. 14.
    Iyer, R. K.: Experimental Evaluation. Special Issue 25th FTCS Symp. (1995) 115–132Google Scholar
  15. 15.
    Jenn, E. et al.: Fault Injection into VHDL Models: The MEFISTO tool. Proc. 24th FTCS Symposium (1994) 66–75Google Scholar
  16. 16.
    Kanawati, G., Kanawati, N., Abraham, J.: FERRARI: A Tool for the Validation of System Dependability Properties. Proc. 22nd FTCS Symposium (1992) 336–344Google Scholar
  17. 17.
    Karlsson, J. et al.: Using Heavy-ion Radiation to Validate Fault Handling Mechanisms. IEEE Micro, 1 (1994) 8–32CrossRefGoogle Scholar
  18. 18.
    Karlsson, J. et al.: Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture. Proc. 5th DCCA Conference (1995) 150–161Google Scholar
  19. 19.
    Lala, P. K.: Fault Tolerant and Fault Testable Hardware Design. Prentice Hall Int., New York (1985)Google Scholar
  20. 20.
    Madeira, H., Rela, M., Moreira, F., Silva, J. G.: A General Purpose Pin-level Fault Injector. Proc. 1st European Dependable Computing Conference, (1994) 199–216Google Scholar
  21. 21.
    Powel, D., Martins, E., Arlat, J., Crouzet, Y.: Estimators for fault tolerance coverage evaluation. IEEE Trans. Computers, 2 (1995) 261–274CrossRefGoogle Scholar
  22. 22.
    Powel, D., Cukier, M., Arlat, J.: On stratified sampling for high coverage estimators. Proc. 2nd European Dependable Computing Conference (1996) 37–54Google Scholar
  23. 23.
    Scott, D. T., Ries, G., Hsueh, M., Iyer, R. K.: Dependability Analysis of a High-Speed Network Using Software-implemented Fault Injected and Simulated Fault Injection. IEEE Trans. Computers, 1 (1998) 108–119Google Scholar
  24. 24.
    Segal, Z., Lin, T.: FIAT: Fault Injection Based Automated Testing Environment. Proc. 18th FTCS Symposium (1988) 102–107Google Scholar
  25. 25.
    Silva, J. G. et al: Experimental Assessment of Parallel Systems. Proc. 26th FTCS Symposium (1996) 415–424Google Scholar
  26. 26.
    Siewiorek, D. P., Swarz, R. S.: The Theory and Practice of Reliable Design. Digital Press, Digital Equipment Corp., Bedford, Massachusetts (1984)Google Scholar
  27. 27.
    Trivedi, K. S.: Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice-Hall (1982)Google Scholar
  28. 28.
    Walter, C. J.: Evaluation and Design of an Ultra reliable Distributed Architecture for Fault Tolerance. IEEE Trans. Reliability, 4 (1990) 492–499CrossRefGoogle Scholar
  29. 29.
    PROTO Language. Design Technology Documentation PROT03, Intel Corp (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Cristian Constantinescu
    • 1
  1. 1.Intel Corporation, Server Architecture Lab, CO3-202HillsboroUSA

Personalised recommendations