Advertisement

The Evolution of Fault Tolerant Computing at the University of Illinois

  • J. A. Abraham
  • G. Metze
  • R. K. Iyer
  • J. H. Patel
Conference paper
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 1)

Abstract

The University of Illinois has been active in research in the fault-tolerant computing field for over 25 years. Fundamental ideas have been proposed and major contributions made by researchers at the University of Illinois in the areas of testing and diagnosis, concurrent error detection, and fault tolerance. This paper traces the origins of these ideas and their development within the University of Illinois, as well as their influence upon research at other institutions, and outlines current directions of research.

Keywords

Fault Diagnosis Fault Tolerance Fault Model Logic Array Concurrent Error Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Abraham, J. A., Metze, G., 1978: Roving Diagnosis for High-Performance Digital Systems. Proc. Conf. on Information Sciences and Systems, pp. 221–226.Google Scholar
  2. [2]
    Abraham, J. A., 1979: “An Improved Algorithm for Network Reliability”, IEEE Trans, on Network Reliability R-28, pp 58–61CrossRefGoogle Scholar
  3. [3]
    Abraham, J. A., Gajski, D. D., 1981: “Design of Testable Structures Defined by Simple Loops”, IEEE Trans, on Computers C-30, pp. 875–884Google Scholar
  4. [4]
    Abraham, J. A., Davidson, E. S., Patel, J. H., 1983: “Memory System Design for Tolerating Single-Event Upsets”, IEEE Trans, on Nuclear Science NS-30, No. 6, pp. 4339–4344CrossRefGoogle Scholar
  5. [5]
    Abraham, J. A., Shih, H. -C., 1985: “Testing of MOS VLSI Circuits”, Proc. Int. Symp. on Circuits and Systems, pp. 1297–1300.Google Scholar
  6. [6]
    Anderson, D. A., 1971: “Design of Self-Checking Digital Networks”, Coordinated Science Laboratory Technical Report R-527, University of Illinois, Urbana, Illinois.Google Scholar
  7. [7]
    Anderson, D. A., Metze, G., 1973: “Design of Totally Self-Checking Circuits for m-out-of-n Codes”, IEEE Trans, on Computers C-22, No. 3, pp. 263–269CrossRefGoogle Scholar
  8. [8]
    Banerjee, P., Abraham, J. A., 1984a: “Characterization and Testing of Physical Failures in MOS Logic Circuits”, IEEE Design and Test 1, pp. 76–86CrossRefGoogle Scholar
  9. [9]
    Banerjee, P., Abraham, J. A., 1984b: “Fault-Secure Algorithms for Multiple Processor Systems”, Proc. 11th Int. Symp. on Computer Architecture, pp. 279–287.Google Scholar
  10. [10]
    Banerjee, P., Abraham, J. A., 1985: “A Multivalued Algebra for Modeling Physical Failures in MOS VLSI Circuits”, IEEE Trans, on Computer-Aided Design, CAD-4, No. 3, pp. 312–321CrossRefGoogle Scholar
  11. [11]
    Banerjee, P., Abraham, J. A., 1986: “Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems”, IEEE Trans, on Computers C-35, No. 4,pp. 296–306CrossRefGoogle Scholar
  12. [12]
    Bose, P., Abraham, J. A., 1982: “Test Generation for Programmable Logic Arrays”, Proc. ACM/IEEE 19th Design Automation Conf., pp. 574–580.Google Scholar
  13. [13]
    Brahme, D., Abraham, J. A., 1984: “Functional Testing of Microprocessors”, IEEE Trans, on Computers C-33, No. 6, pp. 475–485CrossRefGoogle Scholar
  14. [14]
    Breuer, M. A., Ismaeel, A. A., 1983: “Roving Emulation as a Fault Detection Mechanism”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 206–215.Google Scholar
  15. [15]
    Carter, W. C., Schneider, P. R., 1968: “Design of Dynamically Checked Computers”, Proc. IFIP Congress 2, pp. 878–883Google Scholar
  16. [16]
    Cha, C. W., 1974: “Multiple Fault Diagnosis in Combinational Networks”, Coordinated Science Laboratory Technical Report R-650, University of Illinois, Urbana, Illinois.Google Scholar
  17. [17]
    Chang, H. Y., Manning, E., Metze, G, 1970: “Fault Diagnosis of Digital Systems”, Huntington, NY: Robert E., Krieger Publishing Company.Google Scholar
  18. [18]
    Cheng, W. -T., Patel, J. H., 1984: “Concurrent Error Detection in Iterative Logic Arrays”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 10–15.Google Scholar
  19. [19]
    Cheng, W. -T., Patel, J. H., 1985a: “A Minimum Test Set for Multiple-Fault Detection in Ripple-Carry Adders”Google Scholar
  20. [20]
    Proc. Int. Conf. on Computer Design, pp. 435–438.Google Scholar
  21. [21]
    Cheng, W. -T., Patel, J. H., 1985b: “Multiple-Fault Detection in Iterative Logic Arrays”, Proc. Int. Test Conf., pp. 493–499.Google Scholar
  22. [22]
    Cheng, W. -T. Patel, J. H., 1985c: “A Shortest Length Test Sequence for Sequential-Fault Detection in Ripple Carry Adders”, Proc. Int. Conf. on Computer-Aided Design, pp. 71–73.Google Scholar
  23. [23]
    Chillarege, R., Iyer, R. K., 1985: “The Effect of System Workload on Error Latency: An Experimental Study”, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 69–77.Google Scholar
  24. [24]
    Chillarege, R., Iyer, R. K., 1986: “Fault Latency in the Memory-An Experimental Study on VAX 11/780”, Proc. 16th Int. Symp. on Fault-Tolerant Computing.Google Scholar
  25. [25]
    Choi, Y. -H., Malek, M., 1985: “A Fault-Tolerant FFT Processor”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 266–271.Google Scholar
  26. [26]
    Chou, T. C. -K., Abraham, J. A., 1980: “Performance/Availability Modeling of Shared Resource Multiprocessors”, IEEE Trans, on Reliability R-29, pp. 70–74Google Scholar
  27. [27]
    Chou, T. C. -K., Abraham, J. A., 1983: “Load Redistribution under Failure in Distributed Systems”, IEEE Trans, on Computers C-32, pp. 799–808CrossRefGoogle Scholar
  28. [28]
    Dahbura, A. T., Masson, G. M., 1984: “An Order 0(n2 5) Fault Identification Algorithm for Diagnosable Systems”, IEEE Trans, on Computers C-33, pp. 486–492Google Scholar
  29. [29]
    Davis, T. A., Kunda, R. P., Fuchs, W. K., 1985: “Testing of Bit-Serial Multipliers”, Proc. Int. Conf. on Computer Design, pp. 430–434.Google Scholar
  30. [30]
    Dussault, J., 1977: “On the Design of Self-Checking Systems under Various Fault Models”, Coordinated Science Laboratory Technical Report R-781, University of Illinois, Urbana, Illinois.Google Scholar
  31. [31]
    Friedman, A. D., Simoncini, L., 1980: “System-Level Fault Diagnosis”, Computer (Special Issue on Fault-Tolerant Computing) 13, No. 3, pp. 47–53Google Scholar
  32. [32]
    Fuchs, W. K., Abraham, J. A., Huang, K. -H., 1983: “Concurrent Error Detection in VLSI Interconnection Networks”, Proc. 10th Int. Symp. on Computer Architecture, pp. 309–315. Also reprinted in: Interconnection Networks for Parallel and Distributed Processing (Wu, C. -H., Fung, T. -Y., eds.), pp. 380–386. IEEE Press.Google Scholar
  33. [33]
    Fuchs, W. K., Abraham, J. A., 1984: “A Unified Approach to Concurrent Error Detection in Highly Structured Logic Arrays”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 4–9Google Scholar
  34. [34]
    Fujii, R., Abraham, J. A., 1985: “Self-Test for Microprocessors”, Proc. Int. Test Conf., pp. 356–361.Google Scholar
  35. [35]
    Fujiwara, H., Kinoshita, K., 1981: “A Design of Programmable Logic Arrays with Universal Tests”, IEEE Trans, on Computers CD-30, No. 11, pp. 823–828CrossRefGoogle Scholar
  36. [36]
    Hayes, J. P., 1971: “A NAND Model for Fault Diagnosis in Combinational Logic Networks”, IEEE Trans, on Computers C-20, pp. 1496–1506Google Scholar
  37. [37]
    Hong, S. J., Ostapko, D. L., 1980: “FITPLA: A Programmable Logic Array for Function-Independent Testing”, Proc. 10th Int. Conf. on Fault-Tolerant Computing, pp. 131–136.Google Scholar
  38. [38]
    Hua, K. A., Jou, J. -Y., Abraham, J. A., 1984: “Built-in Tests for VLSI Finite-State Machines”, Proc. 14th Int. Conf. on Fault-Tolerant Computing, pp. 292–297.Google Scholar
  39. [39]
    Huang, K. -H., Abraham, J. A., 1982: “Low-Cost Schemes for Fault Tolerance in Matrix Operations with Array Processors”, Proc. 12th Int. Symp. on Fault-Tolerant Computing, pp. 330–337.Google Scholar
  40. [40]
    Huang, K. -H., Abraham, J. A., 1984a: “Algorithm-Based Fault Tolerance for Matrix Operations”, IEEE Trans, on Computers (Special Issue on Reliable and Fault-Tolerant Computing) C-33, pp. 518–528Google Scholar
  41. [41]
    Huang, K. -H., Abraham, J. A., 1984b: “Fault-Tolerant Algorithms and their Applications to Solving Laplace Equations”, Proc. Int. Conf. on Parallel Processing, pp. 117–122.Google Scholar
  42. [42]
    Iyer, R. K., Rossetti, D. J., 1985: “Effect of System Workload on Operating System Reliability: A Study on the IBM 3081”, IEEE Trans, on Software Engineer ng (Special Issue on Software Reliability, Part 1) SE-11, No.: pp. 1438–1448.Google Scholar
  43. [43]
    Iyer, R. K., Rossetti, D. J., 1986: “A Measurement-Based Model for Workload Dependency of CPU Errors”, IEEE Trans, on Computers C-35, No. 6 (to appear).Google Scholar
  44. [44]
    Jansch, I., Courtois, B., 1985: “Strongly Language Disjoint Checkers”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 390–395.Google Scholar
  45. [45]
    Jha, N. K., Abraham, J. A., 1984: “The Design of Totally Self-Checking Embedded Checkers”, Proc. 14th Int. Symp. on Fault-Tolerant Computing, pp. 265–270.Google Scholar
  46. [46]
    Jha, N. K., Abraham, J. A. 1985a: “Techniques for Efficient MOS Implementation of Totally Self-Checking Checkers”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 430–435.Google Scholar
  47. [47]
    Jha, N. K., Abraham, J. A., 1985b: “Design of Testable CMOS Logic Circuits under Arbitrary Delays”, IEEE Trans, on Computer-Aided Design, CAD-4, No. 3, pp. 312–321Google Scholar
  48. [48]
    Jou, J. -Y., Abraham, J. A., 1984: “Fault-Tolerant Matrix Operations on Multiple Processor Systems using Weighted Checksums”, Proc. SPIE Conf., pp. 94–101.Google Scholar
  49. [49]
    Jou, J. -Y., Abraham, J. A., 1985: “Fault-Tolerant FFT Networks”, Proc. Int. Symp. on Fault-Tolerant Computing, pp. 338–343.Google Scholar
  50. [50]
    Laha, S., Patel, J. H., 1983: “Error Correction in Arithmetic Operations using Time Redundancy”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 298–305.Google Scholar
  51. [51]
    Luk, F. T., 1985: “Algorithm-Based Fault Tolerance for Parallel Matrix Equation Solvers”, Proc. SPIE Conf. ( Real-Time Signal Processing VIII ) 564.Google Scholar
  52. [52]
    Mak, G. -P., Davidson, E. S., Abraham, J. A., 1982: “The Design of PLAs with Concurrent Error Detection”, Proc. 12th Int. Symp. on Fault-Tolerant Computing, pp. 303–310.Google Scholar
  53. [53]
    Manning, E., 1966: “On Computer Self-Diagnosis: Part I and II”, IEEE Trans. Electronic Computers EC-15, pp. 873–890Google Scholar
  54. [54]
    Marlett, R. A., 1966: “On the Design and Testing of Self-Diagnosable Computers”, Coordinated Science Laboratory Technical Report R-293, University of Illinois, Urbana, Illinois.Google Scholar
  55. [55]
    McCluskey, E. J., Clegg, F. W., 1971: “Fault Equivalence in Combinational Logic Networks”, IEEE Trans, on Computers C-20, pp. 1286–1293.Google Scholar
  56. [56]
    Meagher, R. E., Nash, J. P., 1952: “The ORDVAC”, Review of Electronic Digital Computers, pp. 37–43.Google Scholar
  57. [57]
    Muller, D. E., Bartky, J. S., 1959: “A Theory of Asynchronous Circuits”, Proc. Int. Symp. on Theory of Switching, pp. 204–243.Google Scholar
  58. [58]
    Nair, R., Thatte, S. M., Abraham, J. A., 1978: “Efficient Algorithms for Testing Semiconductor Random-Access Memories”, IEEE Trans, on Computers C-27, No. 6, pp. 572–576MathSciNetCrossRefGoogle Scholar
  59. [59]
    Patel, J. H., Fung, L. Y., 1982: “Concurrent Error Detection in ALUs by Recomputing with Shifted Operands”, IEEE Trans, on Computers, vol. C-31, pp. 589–595.Google Scholar
  60. [60]
    Patel, J. H., Fung, L. Y., 1983: “Concurrent Error Detection in Multiply and Divide Arrays”, IEEE Trans, on Computers, vol. C-32, pp. 417–422.CrossRefGoogle Scholar
  61. [61]
    Pollard, L. H., Patel, J. H., 1983: “Correction of Errors in Data Transmission using Time Redundancy”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 314–317.Google Scholar
  62. [62]
    Preparata, F. P., Metze, G., Chien, R. T., 1967: “On the Connection Assignment Problem of Diagnosable Systems”, IEEE Trans, on Electronic Computers EC-16, No. 6, pp. 848–854CrossRefGoogle Scholar
  63. [63]
    Reynolds, D. A., Metze, G., 1978: “Fault Detection Capabilities of Alternating Logic”, IEEE Trans, on Computers, vol. C-27, pp. 1093–1098.Google Scholar
  64. [64]
    Rogers, W. A., Abraham, J. A., 1985a: “High-Level Hierarchical Fault Simulation Techniques”, Proc. ACM Computer Science Conference, pp. 89–97.Google Scholar
  65. [65]
    Rogers, W. A., Abraham, J. A., 1985b: “CHIEFS: A Concurrent, Hierarchical, and Extensible Fault Simulator”, Proc. Int. Test Conf., pp. 710–716.Google Scholar
  66. [66]
    Schertz, D. R., Metze, G., 1968: “On the Indistinguishability of Faults in Digital Systems”, Proc. 6th Ann. Allerton Conf. on Circuit and System Theory, pp. 752–760.Google Scholar
  67. [67]
    Schertz, D. R., 1969: “On the Representation of Digital Faults”, Coordinated Science Laboratory Technical Report R-418, University of Illinois, Urbana, Illinois.Google Scholar
  68. [68]
    Schertz, D. R. and Metze, G., 1972: “A New Representation for Faults in Combinational Digital Circuits”, IEEE Trans, on Computers, C-21, No. 8, pp. 858–866CrossRefGoogle Scholar
  69. [69]
    Seshu, S., Freeman, D. N., 1962: “The Diagnosis of Asynchronous Sequential Switching Systems”, IRE Trans, on Electronic Computers EC-11, No. 4, pp. 459–465MathSciNetCrossRefGoogle Scholar
  70. [70]
    Seshu, S., 1964: “The Logic Organizer and Diagnosis Programs”, Coordinated Science Laboratory Technical Report R-226, University of Illinois, Urbana, Illinois.Google Scholar
  71. [71]
    Seshu, S., 1965: “On an Improved Diagnosis Program”, IEEE Trans, on Electronic Computers EC-14, No. 1, pp. 76–79CrossRefGoogle Scholar
  72. [72]
    Shih, H. -C., Rahmeh, J. T., Abraham, J. A., 1985: “An MOS Fault Simulator with Timing Information”, Proc. Int. Conf. on Computer-Aided Design, pp. 45–47.Google Scholar
  73. [73]
    Smith, J. E., Metze, G., 1975: “On the Existence of Combinational Networks with Arbitrary Multiple Redundancies”, Coordinated Science Laboratory Technical Report R-692, University of Illinois, Urbana, Illinois.Google Scholar
  74. [74]
    Smith, J. E., 1976: “The Design of Totally Self-Checking Combinational Circuits”, Coordinated Science Laboratory Technical Report R-737, University of Illinois, Urbana, Illinois.Google Scholar
  75. [75]
    Smith, J. E., Metze, G., 1978: “Strongly Fault-Secure Logic Networks”, IEEE Trans, on Computers C-27, No. 6, pp. 491–499.MathSciNetCrossRefGoogle Scholar
  76. [76]
    Smith, J. E., 1979: “On Necessary and Sufficient Conditions for Multiple Fault Undetectability”, IEEE Trans, on Computers C-28, pp. 801–802Google Scholar
  77. [77]
    Suk, D. S., Reddy, S. M., 1981: “A March Test for Functional Faults in Semiconductor Random Access Memories”, IEEE Trans, on Computers C-30, pp. 982–984CrossRefGoogle Scholar
  78. [78]
    Thatte, S. M., Abraham, J. A., 1977: “Testing of Semiconductor Random Access Memories”, Proc. 7th Int. Symp. on Fault-Tolerant Computing, pp. 81–87.Google Scholar
  79. [79]
    Thatte, S. M., Abraham, J. A., 1980: “Test Generation for Microprocessors”, IEEE Trans, on Computers C-29, No. 6, pp. 429–441.MathSciNetCrossRefGoogle Scholar
  80. [80]
    To, K., 1973: “Fault Folding for Irredundant and Redundant Combinational Circuits”, IEEE Trans, on Computers C-22, No. 11, pp. 1008–1015.MathSciNetCrossRefGoogle Scholar
  81. [81]
    Treuer, R., Fujiwara, H., Agarwal, V. K., 1985: “A Low-Overhead, High Coverage, Built-in Self-Test PLA Design”, Proc. 15th Int. Symp. on Fault-Tolerant Computing, pp. 112–117.Google Scholar
  82. [82]
    Wheeler, D. J., Robertson, J. E., 1953: “Diagnostic Programs for the ILLIAC”, Proc. IRE 41, pp. 1320–1325.MathSciNetCrossRefGoogle Scholar
  83. [83]
    Wong, C. -Y., Fuchs, W. K., Abraham, J. A., Davidson, E. S., 1983: “The Design of a Microprogram Control Unit with Concurrent Error Detection”, Proc. 13th Int. Symp. on Fault-Tolerant Computing, pp. 476–483.Google Scholar
  84. [84]
    Yen, M. M., 1984: “Design of a Microprogram Control Unit with Concurrent Error Detection”, Computer Systems Group Technical Report CSG-30, Coordinated Science Laboratory, University of Illinois, Urbana, Illinois.Google Scholar

Copyright information

© Springer-Verlag/Wien 1987

Authors and Affiliations

  • J. A. Abraham
    • 1
  • G. Metze
    • 1
  • R. K. Iyer
    • 1
  • J. H. Patel
    • 1
  1. 1.Department of Electrical and Computer Engineering and the Coordinated Science LaboratoryUniversity of IllinoisUrbanaUSA

Personalised recommendations