Abstract
An approach is presented to increasing the reliability of future high-end systems beyond what is possible with technological solutions alone. The system consists of computation nodes and communication nodes, interconnected by high-speed dedicated links. These components are relied upon to detect errors while system level protocols are used for error recovery and reconfiguration. The use of duplication and matching for implementing the self-checking nodes allows us to restrict a detailed analysis of the impact of all possible faults to the comparator, a circuit that can be implemented in a relatively straight-forward way in NMOS or CMOS technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Anderson and P. A. Lee, “Fault Tolerance Terminology Proposals,” 12th Fault- Tolerant Computing Symposium, Santa Monica, CA, pp. 29–33 (June 1982).
A. Avizienis, “Design Diversity - The Challenge of the Eighties,” 12th Fault- Tolerant Computing Symposium, Santa Monica, CA, pp. 44–45 (June 1982).
G. Barigazzi and L. Strigini, “Application-Transparent Setting of Recovery Points,” 13th Fault-Tolerant Computing Symposium, Milano, Italy, pp. 48–55 (June 1983).
A. Borg, J. Baumbach, and S. Glazer, “A Message System Supporting Fault Tolerance,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 90–99 (October 1983).
M. Bozyigit and Y. Paker, “A Topology Reconfiguration Mechanism for Distributed Computer Systems,” The Computer Journal 25(1), pp. 87–92 (February 1982).
W. C. Carter and P. R. Schneider, “Design of Dynamically Checked Computers,” IFIPS Proceedings, Edinburgh, Scotland, pp. 878–883 (August 1968).
B. Courtois, “Failure Mechanisms, Fault Hypotheses and Analytical Testing of LSINMOS (HMOS) Circuits,” pp. 341–350 in VLSI 81, ed. J. P. Gray, Academic Press (1981).
R. P. Davidson, M. L. Harrison, and R. L. Wadsack, “BELLMAC-32: A Testable 32 Bit Microprocessor,” 1981 International Test Conference Proceedings, Philadelphia, PA, pp. 15–20 (October 1981).
E. A. Doyle, “How Parts Fail,” IEEE Spectrum 18(10), pp. 36–43 (October 1981).
J. Galiay, Y. Crouzet, and M. Vergniault, “Physical Versus Logical Fault Models MOS LSI Circuits: Impact on Their Testability,” IEEE Transactions on Computers C-29(6), pp. 527–531 (June 1980).
J. N. Gray, “Notes on Data Base Operating Systems,” pp. 393–481 in Operating Systems: An Advanced Course, ed. G. Goos and J. Hartmanis, Springer-Verlag, Berlin (1978). Lecture Notes in Computer Science 60.
J. Khakbaz and E. J. McCluskey, “Concurrent Error Detection and Testing for Large PLA’s,” IEEE Journal of Solid-State Circuits SC-17(2), pp. 386–394 (April 1982).
G. P. Mak, J. A. Abraham, and E. S. Davidson, “The Design of PLAs with Concurrent Error Detection,” 12th Fault-Tolerant Computing Symposium, Santa Monica, CA, pp. 303–310 (June 1982).
D. G. Platteter, “Transparent Protection of Untestable LSI Microprocessors,” 10th Fault-Tolerant Computing Symposium, Kyoto, Japan, pp. 345–347 (October 1980).
M. L. Powell and D. L. Presotto, “Publishing: A Reliable Broadcast Communication Mechanism,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 100–109 (October 1983).
M. L. Powell and B. P. Miller, “Process Migration in DEMOS/MP,” Proc. 9th Symp. on Operating Systems Principles, Bretton Woods, NH, pp. 110–119 (October 1983).
D. K. Pradhan, “Fault-Tolerant Architectures for Multiprocessors and VLSI Systems,” 13th Fault-Tolerant Computing Symposium, Milano, Italy, pp. 436–441 (June 1983).
B. Randell, P. A. Lee, and P. C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10(2), pp. 123–165 (June 1978).
R. A. Rasmussen, “Automated Testing of LSI,” Computer 15(3), pp. 69–78 (March 1982).
D. A. Rennels, “Architectures for Fault-Tolerant Spacecraft Computers,” Proceedings IEEE 66(10), pp. 1255–1268 (October 1978).
R. M. Sedmak and H. L. Liebergot, “Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29(6), pp. 492–500 (June 1980).
D. P. Siewiorek and R. S. Swarz, The Theory and Practice of Reliable System Design, Digital Press (1982).
C. H. Séquin and R. M. Fujimoto, “X-Tree and Y-Components,” pp. 299–326 in VLSI Architecture, ed. B. Randell and P.C. Treleaven, Prentice Hall, Englewood Cliffs, NJ (1983).
W. D. Tajibnapis, “A Correctness Proof of a Topology Information Maintenance Protocol for a Distributed Computer Network,” Communications of the ACM 20(7), pp. 477–485 (July 1977).
Y. Tamir and C. H. Séquin, “Self-Checking VLSI Building Blocks for Fault-Tolerant Multicomputers,” International Conference on Computer Design, Port Chester, NY, pp. 561–564 (November 1983).
Y. Tamir and C. H. Séquin, “Reducing Common Mode Failures in Duplicate Modules,” International Conference on Computer Design, Port Chester, NY, pp. 302–307 (October 1984).
Y. Tamir and C. H. Séquin, “Error Recovery in Multicomputers Using Global Checkpoints,” 13th International Conference on Parallel Processing, Bellaire, MI, pp. 32–41 (August 1984).
Y. Tamir, “Fault Tolerance for VLSI Multicomputers,” Ph.D. Dissertation, CS Division Report No. UCB/CSD 86/256, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA (August 1985).
R. L. Wadsack, “Fault Modeling and Logic Simulation of CMOS and MOS Integrated Circuits,” The Bell System Technical Journal 57(5), pp. 1449–1474 (May-June 1978).
J. F. Wakerly, “Microcomputer Reliability Improvement Using Triple-Modular Redundancy,” Proceedings of the IEEE 64(6), pp. 889–895 (June 1976).
S. L. Wang and A. Avizienis, “The Design of Totally Self Checking Circuits Using Programmable Logic Arrays,” 9th Fault-Tolerant Computing Symposium, Madison, WI, pp. 173–180 (June 1979).
J. H. Wensley, L. Lamport, J. Golberg, M. W. Green, K. N. Levitt, P. M. Melliar- Smith, R. E. Shostak, and C. B. Weinstock, “SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings IEEE 66(10), pp. 1240–1255 (October 1978).
C. Whitby-Strevens, “The Transputer,” 12th Annual Symposium on Computer Architecture, Boston, MA, pp. 292–300 (June 1985).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Kluwer Academic Publishers
About this chapter
Cite this chapter
Séquin, C.H., Tamir, Y. (1987). Fault Tolerant VLSI Multicomputers. In: Fichtner, W., Morf, M. (eds) VLSI CAD Tools and Applications. The Kluwer International Series in Engineering and Computer Science, vol 24. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1985-6_15
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1985-6_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-9186-2
Online ISBN: 978-1-4613-1985-6
eBook Packages: Springer Book Archive