Skip to main content

Fault Tolerant VLSI Multicomputers

  • Chapter
VLSI CAD Tools and Applications

Abstract

An approach is presented to increasing the reliability of future high-end systems beyond what is possible with technological solutions alone. The system consists of computation nodes and communication nodes, interconnected by high-speed dedicated links. These components are relied upon to detect errors while system level protocols are used for error recovery and reconfiguration. The use of duplication and matching for implementing the self-checking nodes allows us to restrict a detailed analysis of the impact of all possible faults to the comparator, a circuit that can be implemented in a relatively straight-forward way in NMOS or CMOS technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Anderson and P. A. Lee, “Fault Tolerance Terminology Proposals,” 12th Fault- Tolerant Computing Symposium, Santa Monica, CA, pp. 29–33 (June 1982).

    Google Scholar 

  2. A. Avizienis, “Design Diversity - The Challenge of the Eighties,” 12th Fault- Tolerant Computing Symposium, Santa Monica, CA, pp. 44–45 (June 1982).

    Google Scholar 

  3. G. Barigazzi and L. Strigini, “Application-Transparent Setting of Recovery Points,” 13th Fault-Tolerant Computing Symposium, Milano, Italy, pp. 48–55 (June 1983).

    Google Scholar 

  4. A. Borg, J. Baumbach, and S. Glazer, “A Message System Supporting Fault Tolerance,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 90–99 (October 1983).

    Google Scholar 

  5. M. Bozyigit and Y. Paker, “A Topology Reconfiguration Mechanism for Distributed Computer Systems,” The Computer Journal 25(1), pp. 87–92 (February 1982).

    Google Scholar 

  6. W. C. Carter and P. R. Schneider, “Design of Dynamically Checked Computers,” IFIPS Proceedings, Edinburgh, Scotland, pp. 878–883 (August 1968).

    Google Scholar 

  7. B. Courtois, “Failure Mechanisms, Fault Hypotheses and Analytical Testing of LSINMOS (HMOS) Circuits,” pp. 341–350 in VLSI 81, ed. J. P. Gray, Academic Press (1981).

    Google Scholar 

  8. R. P. Davidson, M. L. Harrison, and R. L. Wadsack, “BELLMAC-32: A Testable 32 Bit Microprocessor,” 1981 International Test Conference Proceedings, Philadelphia, PA, pp. 15–20 (October 1981).

    Google Scholar 

  9. E. A. Doyle, “How Parts Fail,” IEEE Spectrum 18(10), pp. 36–43 (October 1981).

    Google Scholar 

  10. J. Galiay, Y. Crouzet, and M. Vergniault, “Physical Versus Logical Fault Models MOS LSI Circuits: Impact on Their Testability,” IEEE Transactions on Computers C-29(6), pp. 527–531 (June 1980).

    Article  Google Scholar 

  11. J. N. Gray, “Notes on Data Base Operating Systems,” pp. 393–481 in Operating Systems: An Advanced Course, ed. G. Goos and J. Hartmanis, Springer-Verlag, Berlin (1978). Lecture Notes in Computer Science 60.

    Google Scholar 

  12. J. Khakbaz and E. J. McCluskey, “Concurrent Error Detection and Testing for Large PLA’s,” IEEE Journal of Solid-State Circuits SC-17(2), pp. 386–394 (April 1982).

    Article  Google Scholar 

  13. G. P. Mak, J. A. Abraham, and E. S. Davidson, “The Design of PLAs with Concurrent Error Detection,” 12th Fault-Tolerant Computing Symposium, Santa Monica, CA, pp. 303–310 (June 1982).

    Google Scholar 

  14. D. G. Platteter, “Transparent Protection of Untestable LSI Microprocessors,” 10th Fault-Tolerant Computing Symposium, Kyoto, Japan, pp. 345–347 (October 1980).

    Google Scholar 

  15. M. L. Powell and D. L. Presotto, “Publishing: A Reliable Broadcast Communication Mechanism,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 100–109 (October 1983).

    Google Scholar 

  16. M. L. Powell and B. P. Miller, “Process Migration in DEMOS/MP,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 110–119 (October 1983).

    Google Scholar 

  17. D. K. Pradhan, “Fault-Tolerant Architectures for Multiprocessors and VLSI Systems,” 13th Fault-Tolerant Computing Symposium, Milano, Italy, pp. 436–441 (June 1983).

    Google Scholar 

  18. B. Randell, P. A. Lee, and P. C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10(2), pp. 123–165 (June 1978).

    Article  MATH  Google Scholar 

  19. R. A. Rasmussen, “Automated Testing of LSI,” Computer 15(3), pp. 69–78 (March 1982).

    Article  MathSciNet  Google Scholar 

  20. D. A. Rennels, “Architectures for Fault-Tolerant Spacecraft Computers,” Proceedings IEEE 66(10), pp. 1255–1268 (October 1978).

    Article  Google Scholar 

  21. R. M. Sedmak and H. L. Liebergot, “Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29(6), pp. 492–500 (June 1980).

    Article  Google Scholar 

  22. D. P. Siewiorek and R. S. Swarz, The Theory and Practice of Reliable System Design, Digital Press (1982).

    Google Scholar 

  23. C. H. Séquin and R. M. Fujimoto, “X-Tree and Y-Components,” pp. 299–326 in VLSI Architecture, ed. B. Randell and P.C. Treleaven, Prentice Hall, Englewood Cliffs, NJ (1983).

    Google Scholar 

  24. W. D. Tajibnapis, “A Correctness Proof of a Topology Information Maintenance Protocol for a Distributed Computer Network,” Communications of the ACM 20(7), pp. 477–485 (July 1977).

    Article  MathSciNet  MATH  Google Scholar 

  25. Y. Tamir and C. H. Séquin, “Self-Checking VLSI Building Blocks for Fault-Tolerant Multicomputers,” International Conference on Computer Design, Port Chester, NY, pp. 561–564 (November 1983).

    Google Scholar 

  26. Y. Tamir and C. H. Séquin, “Reducing Common Mode Failures in Duplicate Modules,” International Conference on Computer Design, Port Chester, NY, pp. 302–307 (October 1984).

    Google Scholar 

  27. Y. Tamir and C. H. Séquin, “Error Recovery in Multicomputers Using Global Checkpoints,” 13th International Conference on Parallel Processing, Bellaire, MI, pp. 32–41 (August 1984).

    Google Scholar 

  28. Y. Tamir, “Fault Tolerance for VLSI Multicomputers,” Ph.D. Dissertation, CS Division Report No. UCB/CSD 86/256, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA (August 1985).

    Google Scholar 

  29. R. L. Wadsack, “Fault Modeling and Logic Simulation of CMOS and MOS Integrated Circuits,” The Bell System Technical Journal 57(5), pp. 1449–1474 (May-June 1978).

    MATH  Google Scholar 

  30. J. F. Wakerly, “Microcomputer Reliability Improvement Using Triple-Modular Redundancy,” Proceedings of the IEEE 64(6), pp. 889–895 (June 1976).

    Article  Google Scholar 

  31. S. L. Wang and A. Avizienis, “The Design of Totally Self Checking Circuits Using Programmable Logic Arrays,” 9th Fault-Tolerant Computing Symposium, Madison, WI, pp. 173–180 (June 1979).

    Google Scholar 

  32. J. H. Wensley, L. Lamport, J. Golberg, M. W. Green, K. N. Levitt, P. M. Melliar- Smith, R. E. Shostak, and C. B. Weinstock, “SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings IEEE 66(10), pp. 1240–1255 (October 1978).

    Article  Google Scholar 

  33. C. Whitby-Strevens, “The Transputer,” 12th Annual Symposium on Computer Architecture, Boston, MA, pp. 292–300 (June 1985).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Kluwer Academic Publishers

About this chapter

Cite this chapter

Séquin, C.H., Tamir, Y. (1987). Fault Tolerant VLSI Multicomputers. In: Fichtner, W., Morf, M. (eds) VLSI CAD Tools and Applications. The Kluwer International Series in Engineering and Computer Science, vol 24. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1985-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1985-6_15

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-9186-2

  • Online ISBN: 978-1-4613-1985-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics