Self-Checking and Self-Exercising Design for Hierarchic Long-Life Fault-Tolerant Systems

  • David Rennels
  • Hyeongil Kim
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 285)


This research deals with fault-tolerant computers capable of operating for extended periods without external maintenance. Conventional fault-tolerance techniques such as majority voting are unsuitale for these applications, because performance is too low, power consumption is too high and ab exces- sive number of spares must be included to keep all of the replicated systems working over an extended life. The preferred design approach is to operate as many different computations as possible on single computers, thus maximiz- ing the amount of processing available from limited hardware resources. Fault-tolerance is implemented in a hierarchic fashion. Fault recovery is either done locally within an afflicted computer or, if that unsuccsessfull, by the other working computers when one fails. Concurrent error detrection is required in the computer making up these system since errors must be quickly detected and isolated to allow recovery to begin.

This chaptrer discusses ways of implementing concurrent error detection (i.e., self-checking) and in addition providing self-exercising capabilities that can rapidly expose dormant faults and latent errors. The fundamentals of self- checking design are presented along with an example -- the design of a self - checking self-exercising memory system. A new methodology for implement- ing self-checking in asynchoronous subsystems is discussed along with error simulation result to examine its effectiveness.


Input Pair Rout Signal Reset Signal Concurrent Error Detection Undetected Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rennels, D. and J. Rohr, “Fault-Tolerant Parallel Processors for Avionics with Reduced Maintenance,” Proc. 9th Digital Avionics Systems Conference, October 15–18, 1990, Virginia Beach, Virginia.Google Scholar
  2. 2.
    W.C. Carter, A.B. Wadia, and D.C. Jessep Jr., “Computer Error Control by Testable Morphic Boolean Functions — A Way of Removing Hardcore”, In Proc. 1972 Int. Symp. Fault-Tolerant Computing, pages 154–159, Newton, Massachusetts, June 1972.Google Scholar
  3. 3.
    Rennels, D., “Architectures for Fault-Tolerant Spacecraft Computers”, Proc. of the IEEE, October 1978, 66–10: 1255–1268.Google Scholar
  4. 4.
    David A. Rennels and Hyeongil Kim, “VLSI Implementation of A Self-Checking Self-Exercising Memory System”. Proc. 21th Int. Symp. Fault-Tolerant Computing, pages 170–177, Montreal, Canada, June 1991.Google Scholar
  5. 5.
    Meyer, J. and L. Wei, “Influence of Workload on Error Recovery in Random Access Memories,” IEEE Trans. Computers, April 1988, pp. 500–507.Google Scholar
  6. 6.
    Z. Barziiai, V.S. Iyengar, B.K. Rosen, and G.M. Silberman, “Accurate Fault Modeling and Efficient Simulation of Differential CVS Circuits” In International Test Conference, pages 722–729, Philadelphia, PA, Nov 1985.Google Scholar
  7. 7.
    R. K. Montoye, “Testing Scheme for Differential Cascode Voltage Switch Circuits”. IBM Technical Disclosure Bulletin, 27(10B):6148–6152, Mar 1985.Google Scholar
  8. 8.
    Niraj K. Jha, “Fault Detection in CVS Parity Trees: Application to SSC CVS Parity and Two-Rail Checkers”, In Proc. 19th Int. Symp. Fault-Tolerant Computing, pages 407–414, Chicago, IL, June 1989.Google Scholar
  9. 9.
    Niraj K. Jha, “Testing of Differential Cascode Voltage Switch One-Count Generators”. IEEE Journal of Solid-State Circuits, 25(1):246–253, Feb 1990CrossRefGoogle Scholar
  10. 10.
    Andres R. Takach and Niraj K. Jha., “Easily Testable DCVS Multiplier”. In IEEE International Symposium on Circuits and Systems, pages 2732–2735, New Orleans, LA., June 1990.Google Scholar
  11. 11.
    N. Kanopoulos and N. Vasanthavada, “Testing of Differential Cascode Voltage Switch (DCVS) Circuits”, IEEE Journal of Solid-State Circuits, 25(3):806–813. June 1990.CrossRefGoogle Scholar
  12. 12.
    N. Kanopoulos, Dimitris Pantzartzis, and Frederick R. Bartram, “Design of Self-Checking Circuits Using DCVS Logic: A Case Study”, IEEE Transactions on Computers, 41(7):891–896, July 1992.CrossRefGoogle Scholar
  13. 13.
    Alain J. Martin, Steven M. Burns, T. K. Lee, Drazen Borkovic, and Pieter J. Hazewindus, “The Design of an Asynchronous Microprocessor”. Technical Report Caltech-CS-TR-89-2, CSD, Caltech, 1989Google Scholar
  14. 14.
    Gordon M. Jacobs and Robert W. Broderson, “A Fully Asynchronous Digital Signal Processor Using Self-timed Circuits”. IEEE Journal of Solid-State Circuits, 25(6):1526–1537, Dec 1990.CrossRefGoogle Scholar
  15. 15.
    W.C. Carter and P.R. Schneider, “Design of Dynamically Checked Computers”, In Proc. IFIP Congress 68, pages 878–883, Edinburgh, Scotland, Aug 1968.Google Scholar
  16. 16.
    Richard M. Sedmak and Harris L. Liebergot, “Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration”. IEEE Transactions on Computer, 29(6):492–500, June 1980.Google Scholar
  17. 17.
    Teresa H. Meng. Synchronization Design for Digital Systems, Kluwer Academic Publishers, 1991.Google Scholar
  18. 18.
    A. Avizienis and D. Renneis, “Fault-Tolerance Experiments with the JPL-STAR Computer”. Dig. of the 6th Annual IEEE Computer Society Int. Conf. (COMPCON), San Francisco, 1972, pp. 321–324.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • David Rennels
  • Hyeongil Kim
    • 1
  1. 1.Computer Science DepartmentUniversity of California at Los AngelesUSA

Personalised recommendations