Advertisement

Fault-Detection by Result-Checking for the Eigenproblem1

  • Paula Prata
  • João Gabriel Silva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1667)

Abstract

This paper proposes a new fault detection mechanism for the computation of eigenvalues and eigenvectors, the so called eigenproblem, for which no such scheme existed before, to the best of our knowledge. It consists of a number of assertions that can be executed on the results of the computation to determine their correctness. The proposed scheme follows the Result Checking principle, since it does not depend on the particular numerical algorithm used. It can handle both real and complex matrices, symmetric or not. Many practical issues are handled, like rounding errors and eigenvalue ordering, and a practical implementation was built on top of unmodified routines of the well-known LAPACK library. The proposed scheme is simultaneously very efficient, with less than 2% performance overhead for medium to large matrices, very effective, since it exhibited a fault coverage greater than 99.7% with a confidence level of 99%, when subjected to extensive fault-injection experiments, and very easy to adapt to other libraries of mathematical routines besides LAPACK.

Keywords

Fault Coverage Fault Injection Gold Code Wrong Output Execution Overhead 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Huang, K.-H. and J. A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, in IEEE Transactions on Computers, 1984, p. 518–528.Google Scholar
  2. 2.
    Banerjee, P., et al., Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor, in IEEE Transactions on Computers, 1990, p. 1132–1144.Google Scholar
  3. 3.
    Chowdhury, A. R. and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations in 24th International Symposium on Fault-Tolerant Computing, 1994. Austin, Texas, p. 38–47.Google Scholar
  4. 4.
    Rela, M. Z., H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai-Japan, p. 394–403.Google Scholar
  5. 5.
    Silva, J. G., J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai, Japan, p. 415–424.Google Scholar
  6. 6.
    Chen, C.-Y. and A. Abraham. Fault-tolerant Systems for the computation of Eigenvalues and Singular Values in Proc. SPIE, Advanced Algorithms Architectures Signal Processing, 1986, p. 228–237.Google Scholar
  7. 7.
    Balasubramanian, V. and P. Banerjee, Algorithm-Based Error Detection for Signal Processing Applications on a Hypercube Multiprocessor, in Real-Time Systems Symposium, 1989, p. 134–143.Google Scholar
  8. 8.
    Blum, M. and S. Kannan, Designing Programs that Check Their Work. Journal of the Association for Computing Machinery, 1995. 42(1): p. 269–291.zbMATHGoogle Scholar
  9. 9.
    Prata, P. and J. G. Silva. Algorithm Based Fault Tolerance Versus Result-Checking for Matrix Computations. To appear in 29th International Symposium on Fault-Tolerant Computing, 1999. Madison, Wisconsin, USA.Google Scholar
  10. 10.
    Velde, E. F. V. d., Concurrent Scientific Computing. 1994: Springer-Verlag.Google Scholar
  11. 11.
    Demmel, J. W., Applied Numerical Linear Algebra. 1997: SIAM.Google Scholar
  12. 12.
    Anderson, E., Z. Bai, C. Bischof, and e. al., LAPACK Users’ Guide. 1995: SIAM.Google Scholar
  13. 13.
    Blum, M. and H. Wasserman, Reflections on The Pentium Division Bug, in IEEE Transactions on Computers, 1996, p. 385–393.Google Scholar
  14. 14.
    Rubinfeld, R., A Mathematical Theory of Self-Checking, Self-Testing and Self-Correcting Programs, PhD Thesis. University of California at Berkeley, 1990. 103 pages.Google Scholar
  15. 15.
    Wasserman, H. and M. Blum, Software Reliability via Run-Time Result-Checking. Journal of the ACM, 1997. 44(6): p. 826–849.zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Rubinfeld, R. Robust functional equations with applications to self-testing / correcting in 35th IEEE Conference on Foundations of Computer Science, 1994, p. 288–299.Google Scholar
  17. 17.
    Silva, J. G., P. Prata, M. Rela, and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model in 28th International Symposium on Fault-Tolerant Computing, 1998. Munich, Germany, p. 26–35.Google Scholar
  18. 18.
    Golub, G. H. and C. F. V. Loan, Matrix Computations. Second edition ed. 1989: Johns Hopkins University Press.Google Scholar
  19. 19.
    Watkins, D. S., Fundamentals of Matrix Computations. 1991: John Wiley & Sons.Google Scholar
  20. 20.
    Reddy, A. L. N. and P. Banerjee, Algorithm-Based Fault Detection for Signal Processing Applications, in IEEE Transactions on Computers, 1990, p. 1304–1308.Google Scholar
  21. 21.
    Jou, J.-Y. and J. A. Abraham, Fault-Tolerant Matrix Arithmetic and Signal Processing on Highly Concurrent Computing Structures, in Proceedings of the IEEE, 1986, p. 732–741.Google Scholar
  22. 22.
    Higham, N., Accuracy and Stability of Numerical Algorithms. 1996: SIAM.Google Scholar
  23. 23.
    Powell, D., M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations in 2nd European Dependable Computing Conference, 1996. Taormina, Italy, p. 37–54.Google Scholar
  24. 24.
    Carreira, J., H. Madeira, and J. G. Silva, Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers, in IEEE Transactions on Software Engineering, 1998, p. 125–135.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Paula Prata
    • 1
  • João Gabriel Silva
    • 2
  1. 1.Dep. Matemática/InformáticaUniversidade da Beira InteriorCovilhãPortugal
  2. 2.Dep. Eng. Informática/CISUCUniversidade de CoimbraCoimbraPortugal

Personalised recommendations