Advertisement

Design and Analysis of Algorithm-Based Fault Tolerant Multiprocessor Systems

  • Shalini Yajnik
  • Niraj K. Jha
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 284)

Abstract

Algorithm-based fault tolerance (ABFT) is a cost-effective technique for improving the reliability of a multiprocessor system. It uses system-level codes to provide concurrent error detection and fault diagnosis capability to the system. This section gives an overview of the design and analysis techniques used in ABFT.1

Keywords

Data Element Fault Tolerance Dependence Graph Error Pattern Unit System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    K.H. Huang and J.A. Abraham, “Algorithm-based fault tolerance for matrix operations,” IEEE Trans. Comput., vol. C-33,no. 6, pp. 518–528, June 1984.CrossRefGoogle Scholar
  2. [2]
    J.Y. Jou and J.A. Abraham, “Fault tolerant matrix arithmetic and signalprocessing on highly concurrent computing structures,” Proc. IEEE, vol. 74, pp. 732–741, May 1986.CrossRefGoogle Scholar
  3. [3]
    J.A. Abraham et al., “Fault tolerance techniques for systolic arrays,” IEEE Computer, pp. 65–74, July 1987.Google Scholar
  4. [4]
    B. Vinnakota and N.K. Jha, “Diagnosability and diagnosis of algorithm-based fault tolerant systems,” IEEE Trans. Comput., vol. 42,no. 8, pp. 924–937, Aug. 1993.CrossRefGoogle Scholar
  5. [5]
    B. Vinnakota and N.K. Jha, “A dependence graph-based approach to the design of algorithm-based fault tolerant systems,” in Proc. Int. Symp. Fault-Tolerant Comput., Newcastle-upon-Tyne, pp. 122–129, June 1990.Google Scholar
  6. [6]
    B. Vinnakota and N.K. Jha, “Design of multiprocessor systems for concurrent error detection and fault diagnosis,” in Proc. Int. Symp. Fault-Tolerant Comput., Montreal, pp. 504–511, June 1991.Google Scholar
  7. [7]
    B. Vinnakota, “Analysis, design and synthesis of algorithm-based fault tolerant systems,” Ph.D. Thesis, Dept. of Electrical Engg., Princeton University, Oct. 1991.Google Scholar
  8. [8]
    R. Sitaraman and N.K. Jha, “Optimal design of checks for error detection and location in fault tolerant multiprocessor systems,” IEEE Trans. Comput., vol. 42,no. 7, pp. 780–793, July 1993.CrossRefGoogle Scholar
  9. [9]
    S. Yajnik and N.K. Jha, “Design of algorithm-based fault tolerant systems with in-system checks,” in Proc. Int. Conf. Parallel Proc., vol. 1, St. Charles, IL, Aug. 1993.Google Scholar
  10. [10]
    S. Yajnik and N.K. Jha, “Analysis and randomized design of algorithm-based fault tolerant multiprocessor systems under the extended graphtheoretic model,” in Proc. ISCA Int. Conf. Parallel Dist. Systems, Louisville, KY, Oct. 1993.Google Scholar
  11. [11]
    S. Yajnik and N.K. Jha. “Graceful degradation in algorithm-based fault tolerant systems,” in Proc. Int. Symp. Circuits & Systems, London, UK, May 1994.Google Scholar
  12. [12]
    S. Srinivasan and N.K. Jha, “Efficient diagnosis in algorithm-based fault tolerant multiprocessor systems,” in Proc. Int. Conf. Computer Design, Boston, MA, pp. 592–595, Oct. 1993.Google Scholar
  13. [13]
    P. Banerjee and J.A. Abraham, “Bounds on algorithm-based fault tolerance in multiple processor systems,” IEEE Trans. Comput., vol. C-35,no. 4, pp. 296–306, Apr. 1986.CrossRefGoogle Scholar
  14. [14]
    P. Banerjee and J.A. Abraham, “Concurrent fault diagnosis in multiple processor systems,” in Proc. Int. Symp. Fault-Tolerant Comput., Vienna, pp. 298–303, June 1986.Google Scholar
  15. [15]
    P. Banerjee, “A theory for algorithm-based fault tolerance in array processor systems,” Ph.D. Thesis, Coordinated Science Laboratory, Univ. of Illinois, Urbana, Dec. 1984.Google Scholar
  16. [16]
    P. Banerjee et al., “Algorithm-based fault tolerance on a hypercube multiprocessor,” IEEE Trans. Comput., vol. 39, pp. 1132–1145, Sept. 1990.CrossRefGoogle Scholar
  17. [17]
    V.S.S. Nair and J.A. Abraham, “A model for the analysis of fault tolerant signal processing architectures,” in Proc. Int. Tech. Symp. SPIE, San Diego, pp. 246–257, Aug. 1988.Google Scholar
  18. [18]
    V.S.S. Nair and J.A. Abraham, “General linear codes for fault-tolerant matrix operations on processor arrays,” in Proc. Int. Symp. Fault-Tolerant Comput., Tokyo, pp. 180–185, June 1988.Google Scholar
  19. [19]
    V.S.S. Nair and J.A. Abraham, “A model for the analysis, design and comparison of fault-tolerant WSI architectures,” in Proc. Workshop Wafer Scale Integration, Como, Italy, June 1989.Google Scholar
  20. [20]
    V.S.S. Nair and J.A. Abraham, “Hierarchical design and analysis of faulttolerant multiprocessor systems using concurrent error detection,” in Proc. Int. Symp. Fault-Tolerant Comput., Newcastle-upon-Tyne, pp. 130–137, June 1990.Google Scholar
  21. [21]
    V.S.S. Nair, “Analysis and design of algorithm-based fault tolerant systems,” Ph.D. Thesis, Coordinated Science Laboratory, Univ. of Illinois, Urbana, Aug. 1990.Google Scholar
  22. [22]
    A.L.N. Reddy and P. Banerjee, “Algorithm-based fault tolerance for signal processing applications,” IEEE Trans. Comput., vol. 39, pp. 1304–1308, Oct. 1990.CrossRefGoogle Scholar
  23. [23]
    V. Balasubramaniam and P. Banerjee, “Algorithm-based fault tolerance for signal processing applications on a hypercube multiprocessor,” in Proc. 10th Real-time Systems Symp., Santa Monica, CA, pp. 134–143, 1989.Google Scholar
  24. [24]
    V. Balasubramaniam and P. Banerjee, “Trade-offs in design of efficient algorithm-based error detection schemes for hypercube multiprocessors,” IEEE Trans. Software Engg., vol. 16, pp. 183–196, Feb. 1990.CrossRefGoogle Scholar
  25. [25]
    V. Balasubramaniam and P. Banerjee, “Compiler assisted synthesis of algorithm-based checking in multiprocessors” IEEE Trans. Comput., vol. 39,no. 4, pp. 436–446, Apr. 1990.CrossRefGoogle Scholar
  26. [26]
    D. Gu, D.J. Rosenkrantz and S.S. Ravi, “Design and analysis of test schemes for algorithm-based fault tolerance,” in Proc. Int. Symp. Fault-Tolerant Comput., Newcastle-upon-Tyne, pp. 106–113, June 1990.Google Scholar
  27. [27]
    D.J. Rosenkrantz and S.S. Ravi, “Improved bounds on algorithm-based fault tolerance,” in Proc. Annual Allerton Conf. Comm., Cont. and Comput., Allerton, IL, pp. 388–397, Sept. 1988.Google Scholar
  28. [28]
    D.M. Blough and A. Pelc, “Almost certain fault diagnosis through algorithm-based fault tolerance,” Tech. Rep. ECE-92-09, Dept. of Electrical and Computer Engg., Univ. of California, Irvine.Google Scholar
  29. [29]
    K.H. Huang, “Fault tolerant algorithms for multiple processor systems,” Ph.D. Thesis, Coordinated Science Laboratory, Univ. of Illinois, Urbana, Nov. 1983.Google Scholar
  30. [30]
    Y.H. Choi and M. Malek, “A fault tolerant FFT processor,” IEEE Trans. Comput., vol. 37,no. 5, pp. 617–621, May 1988.CrossRefGoogle Scholar
  31. [31]
    J.Y. Jou and J.A. Abraham, “Fault-tolerant FFT networks,” IEEE Trans. Comput., vol. 37,no. 5, pp. 548–561, May 1988.CrossRefGoogle Scholar
  32. [32]
    F.T. Luk and H. Park, “Fault-tolerant matrix triangularization on systolic arrays,” IEEE Trans. Comput., vol. 37,no. 11, pp. 1434–1438, Nov. 1988.MATHCrossRefMathSciNetGoogle Scholar
  33. [33]
    F.T. Luk and H. Park, “An analysis of algorithm-based fault tolerance techniques,” in Proc. SPIE Adv. Alg. Arch. Signal Proc., vol. 696, pp. 222–228, Aug. 1986.Google Scholar
  34. [34]
    Y.H. Choi and M. Malek, “A fault-tolerant systolic sorter,” IEEE Trans. Comput., vol. 37,no. 5, pp. 621–624, May 1988.CrossRefGoogle Scholar
  35. [35]
    C.J. Anfinson and F.T. Luk, “A linear algebraic model of algorithm-based fault tolerance,” IEEE Trans. Comput., vol. 37,no. 12, pp. 1599–1604, Dec. 1988.MATHCrossRefMathSciNetGoogle Scholar
  36. [36]
    S.Y. Kung, VLSI Array Processors, Prentice-Hall, Engelwood Cliffs, NJ, 1988.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Shalini Yajnik
  • Niraj K. Jha
    • 1
  1. 1.Department of Electrical EngineeringPrinceton UniversityPrinceton

Personalised recommendations