Advertisement

New Approaches in System-Level Diagnosis

  • Arun K. Somani
  • Ofer Peleg
Part of the Frontiers of Computing Systems Research book series (FCSR, volume 2)

Abstract

The concept of system-level diagnosis for fault diagnosis in multi-processor systems was introduced more than two decades ago. This approach is based on mutual tests conducted by the system processors, rather than circuit- level testing done by an external tester. At first, the research of system-level diagnosis concentrated on the study of uniquely diagnosable systems, and various characterizations for synthesis of such systems under several models of test results interpretations and faults types were presented.

Later on, new directions and aspects evolved from the classic concept of uniquely diagnosable systems. Efforts have been to improve some of its deficiencies, such as the limited degree of diagnosability or the large number of test links required. Researchers have suggested more practical models for diagnosable systems on one hand and, on the other hand, tried to generalize and unify the characterizations of uniquely diagnosable systems for various models of interpretations of test results. As a result of these new approaches, other classes of diagnosable systems (or diagnosability measures) have been introduced and characterized.

The diagnosability and the diagnosis problems have also been addressed quite extensively in recent years. Polynomial time algorithms for the diagnosability problem of some diagnosable system classes have been introduced. Many polynomial time diagnosis algorithms, some of them optimal, have also been introduced in the last few years for several classes of diagnosable systems. These include centralized algorithms to be done on a supervising processor and distributed algorithms to be run on the system processors themselves.

This survey starts by giving a background on the concept of system-level diagnosis and the classic uniquely diagnosable class and then concentrates on alternative classes of diagnosable systems, emphasizing those that were introduced in the last few years. This paper then describes recent developments in the diagnosability and diagnosis areas and discusses future possibilities.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    C.L. Seitz, Concurrent VLSI architectures, IEEE Trans. Comput., Vol. C-33, 1984, pp. 1247–1265.CrossRefGoogle Scholar
  2. [2]
    P.K. Lala, Fault-tolerant and Fault-testable Hardware Design, Prentice-Hall International Inc., London, 1985.Google Scholar
  3. [3]
    D.K. Pradhan, Fault Tolerant Computing, Theory and Techniques, Englewood Cliffs, NJ: Prentice Hall, 1986.Google Scholar
  4. [4]
    F.P. Preparata, G. Metze and R.T. Chien, On the connection assignment problem of diagnosable systems, IEEE Trans. Electr. Comput., Vol. EC-16, 1967, pp. 848–854.CrossRefGoogle Scholar
  5. [5]
    F. Barsi, F. Grandoni and P. Maestrini, A theory of diagnosability without repairs, IEEE Trans. Comput., Vol. C-25, 1976, pp. 585–593.MathSciNetCrossRefGoogle Scholar
  6. [6]
    S. Mallela and G.M. Masson, Diagnosable systems for intermittent faults, IEEE Trans. Comput., Vol. C-27, 1978, pp. 560–566.MathSciNetCrossRefGoogle Scholar
  7. [7]
    A.D. Friedman and L. Simoncini, System-Level Fault Diagnosis, Computer Magazine 13, March 1980, pp. 47–53.Google Scholar
  8. [8]
    K.Y. Chwa and S.L. Hakimi, Schemes for fault tolerant computing: a comparison of modularly redundant and t-diagnosable systems, Information and Control 49, 1981, pp. 212–238.MathSciNetzbMATHCrossRefGoogle Scholar
  9. [9]
    S. Mallela and G.M. Masson, Diagnosis without repairs for hybrid fault situations, IEEE Trans. Comput., Vol. C-29, 1980, pp. 461–470.MathSciNetCrossRefGoogle Scholar
  10. [10]
    A.K. Somani and V.K. Agarwal, Diagnosis in hybrid fault situations under AIM and a unified t-characterization theorem, Comput. Math. Applic. Vol. 13, No. 5/6, 1987, pp. 567–576.zbMATHCrossRefGoogle Scholar
  11. [11]
    Characterization IEEE Trans. Comput., Vol. C-23, 1974, pp. 86–88.Google Scholar
  12. [12]
    F.J. Allan, T. Kameda and S. Toida, An approach to the diagnosabil-ity analysis of a system, IEEE Trans. Comput., Vol. C-24, 1975, pp. 1040–1042.MathSciNetCrossRefGoogle Scholar
  13. [13]
    T. Kohda, On one step diagnosable systems containing at most t faulty units, Systems, Computers, Controls, Vol. 9, No. 5, 1978.Google Scholar
  14. [14]
    G. Sullivan, A Polynomial Time Algorithm for Fault Diagnosability, Annu. Symp. Foundations Comput. Sci., 1984, pp. 148–156.Google Scholar
  15. [15]
    A.T. Dahbura and G.M. Masson, An 0(n2.5) fault identification algorithm for diagnosable systems, IEEE Trans. Comput., Vol. C-33, 1984, pp. 486–492.CrossRefGoogle Scholar
  16. [16]
    C.L. Yang and G.M. Masson, A generalization of hybrid faulty diagnosability, IEEE Symp. Fault-Tolerant Comput., 1985., pp. 36–41.Google Scholar
  17. [17]
    A.T. Dahbura and G.M. Masson, Self implicating structures for diagnosable systems, IEEE Symp. Fault-Tolerant Comput., 1983, pp. 332–335.Google Scholar
  18. [18]
    S.N. Maheshwari and S.L. Hakimi, On models for diagnosable systems and probabilistic fault diagnosis, IEEE Trans. Comput., Vol. C-25, 1976, pp. 228–236.MathSciNetCrossRefGoogle Scholar
  19. [19]
    H. Fujiwara and K. Kinoshita, Connection assignment for probabilistic diagnosable systems, IEEE Trans. Comput., Vol. C-27, 1978, pp. 280–283.MathSciNetCrossRefGoogle Scholar
  20. [20]
    H. Fujiwara and K. Kinoshita, Some existence theorems for probabilistically diagnosable systems, IEEE Trans. Comput., Vol. C-27, 1978, pp. 379–384.MathSciNetCrossRefGoogle Scholar
  21. [21]
    M.L. Blount, Probabilistic treatment of diagnosis in diigital systems, in Proc. 1975 Symp. Fault Tolerant Compt. June 1975, pp. 72–77.Google Scholar
  22. [22]
    A.K. Somani, V.K. Agarwal and D. Avis, A generalized theory for system level diagnosis, IEEE Trans. Comput., Vol. C-36, 1987, pp. 538–546.Google Scholar
  23. [23]
    A.K. Somani, Permanent fault detection under a hybrid fault situation, Technical Report EE-FTCL-89–02, Department of Electrical Engineering, University of Washington, Seattle, WA 98195.Google Scholar
  24. [24]
    A.D. Friedman, A new measure of digital system diagnosis, IEEE Symp. Fault-Tolerant Comput., 1975, pp. 167–169.Google Scholar
  25. [25]
    S. Karunanithi and A.D. Friedman, Analysis of digital systems using a new measure of system diagnosis, IEEE Trans. Comput., Vol. C-25, 1979, pp. 121–133.CrossRefGoogle Scholar
  26. [26]
    S. Huang, J. Xu and T. Chen, Characterization and design of sequentially t-diagnosable systems, IEEE Symp. Fault Tolerant Comput., 1989, pp. 554–559.Google Scholar
  27. [27]
    A. Kavianpour and A.D. Friedman, Efficient design of easily diagnosable systems, Proc. 3rd USA-Japan Computer Conf., IEEE, 1978, pp. 251–257.Google Scholar
  28. [28]
    K.Y. Chwa and S.L. Hakimi, On fault identification in diagnosable systems, IEEE Trans. Comput., Vol. C-30, 1981, pp. 414–422.MathSciNetCrossRefGoogle Scholar
  29. [29]
    C.L. Yang, G.M. Masson and R.A. Leonetti, On fault isolation and identification in t 1 /t 1 -diagnosable systems, IEEE Trans. Comput. Vol. C-35, 1986, pp. 639–643.CrossRefGoogle Scholar
  30. [30]
    C.L. Yang and G.M. Masson, An efficient algorithm for multiprocessor fault diagnosis using the comparison approach, IEEE Symp. Fault-Tolerant Comput., 1986, pp. 238–243.Google Scholar
  31. [31]
    O. Peleg and A.K. Somani, A theory for diagnosis of large fault sets and its application to hypercubes, Submitted to IEEE Trans. Comput.Google Scholar
  32. [32]
    A.K. Somani, Sequential fault occurrence and reconfiguration in system level diagnosis, IEEE Trans. Computers, vol. C-39, pp. 1472–1475 (1990).MathSciNetCrossRefGoogle Scholar
  33. [33]
    K. Nakajima, A new approach to system diagnosis, Proc. 19th Annu. Allerton Conf. Commun., Contr. and Comput., 1981, pp. 697–706.Google Scholar
  34. [34]
    A.K. Somani, V.K. Agarwal and D. Avis, On the complexity of single fault set diagnosability and diagnosis problems, IEEE Trans. Corn-put., Vol. C-38, 1989, pp. 195–201.MathSciNetCrossRefGoogle Scholar
  35. [35]
    H. Fujiwara and K. Kinoshita, On the computational complexity of system diagnosis, IEEE Trans. Comput., Vol. C-27, 1978, pp. 881–885.MathSciNetCrossRefGoogle Scholar
  36. [36]
    G. Sullivan, An 0(t 3 + |E|) fault identification algorithm for diagnosable systems, IEEE Trans. Comput., Vol. C-37, 1988, pp. 388–397.CrossRefGoogle Scholar
  37. [37]
    G.G.L. Meyer, A diagnosis algorithm for the BGM system-level fault model, IEEE Trans. Comput., Vol. C-33, 1984, pp. 756–758.CrossRefGoogle Scholar
  38. [38]
    C.L. Yang and G.M. Masson, A fault identification algorithm for t i diagnosable systems, IEEE Trans. Computers, vol. C-35, pp. 503–510 (1986).CrossRefGoogle Scholar
  39. [39]
    S.L. Hakimi and K. Nakajima, On adaptive system diagnosis, IEEE Trans. Comput., Vol. C-33, 1984, pp. 234–240.MathSciNetCrossRefGoogle Scholar
  40. [40]
    J.G. Kuhl and S.M. Reddy, Fault diagnosis in fully distributed systems, IEEE Symp. Fault-Tolerant Comput., 1981, pp. 100–105.Google Scholar
  41. [41]
    S.H. Hosseini, J.G. Kuhl and S.M. Reddy, Diagnosis algorithm for distributed computing systems, IEEE Trans. Comput., Vol. C-33, 1984, pp. 223–233.CrossRefGoogle Scholar
  42. [42]
    A.K. Somani and V.K. Agarwal, Distributed syndrome decoding for regular interconnected structures, IEEE Symp. Fault-Tolerant Cornput., 1989, pp. 70–77.Google Scholar
  43. [43]
    D.M. Blough, G.F. Sullivan and G.M. Masson, Fault diagnosis for sparsely interconnected multiprocessor systems, IEEE Symp. Fault-Tolerant Comput., 1989, pp. 62–69.Google Scholar

Copyright information

© Plenum Press, New York 1991

Authors and Affiliations

  • Arun K. Somani
    • 1
  • Ofer Peleg
    • 1
  1. 1.Fault Tolerant Computing Laboratory, Department of Electrical Engineering, FT-10University of WashingtonSeattleUSA

Personalised recommendations