Abstract
This essay is based upon my recollections pertinent to fault tolerant computing. The material included is determined by my interactions with talented and adventurous colleagues and with the general computing community. This means that many worthy and interesting projects will be slighted. I apologize to all who worked on these projects, and blame my memory. This essay begins with my work on the ENIAC, (modified to be a writable ROM microprogram controlled computer), and continues through my work helping with the design of dependable (for the period) data processing systems at Raytheon, Datamatic, and Honeywell. My report on work at IBM begins with HARVEST, includes S/360 and ends with my early work at IBM Research in fault tolerant computing, including projects started in the early 1970’s, some of which were published later. After the founding of the IEEE FTTC and the Annual Fault Tolerant Computing Symposia in 1971 there is so much published material that I shall let the professional historians sort it out.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Bibliography
Agnew, R. W., et al, 1967: An Approach to Self-Repairing Computers, Dig. 1st IEEE Computer Group Conf., pp. 37–46.
Alonso, R. L., Blair-Smith H., Hopkins, A. L., 1963: Some Aspects of the Logical Design of a Control Computer: A Case Study, IEEE TEC, pp. 687–698.
Anderson, D. A., Metze G., 1973: Design of Totally Selfchecking Check Circuits for m-out-of-n Codes, IEEE TC, pp. 263–269.
Anderson, J. E., 1968: 7 Years of 0A0, 1968 Product Assurance Conf., Hofstra University.
Arlat, J, Carter, W. C., 1984: Implementation and Evaluation of a (b,k)-Adjacent Error-Correcting/Detecting Scheme for Supercomputer Systems, IBM J. Ramp;D, 28, No. 2, pp. 159–169.
Ball, M., Hardie, F., 1969: Effect amp; Detection of Intermittent Failures in Digital Systems, FJCC, V. 35, pp. 329–336.
Bark, A., Kinne, C. B., 1953: The Application of Pulse Position Modulation to Digital Computers, Proc. NEC, pp. 656–664.
Basche, C. J., Bucholz, W., Rochester, N., 1954: The BM 702, an Electronic Data Processing Machine for Business, JACM, V. 1, pp. 149–169.
Berger, J. M., 1961: A Note on Error Detection Codes for Asymmetric Channels, Info. amp; Control, pp. 68–73.
Bjork, L. A., 1973: Recovery Scenario for a DB/DC System, Proc. ACM Annual Conf., pp. 142–146.
Block, R. M. et al, 1948: The Logical Design of the Raytheon Computer, MTAC, Bu. Stds.
Bock, R. V., Toth, A. P., 1965: Hardware amp; Software for Maintenance in the B5500 Processor, IEEE Int. Conf. pp. 65–72.
Bossen, D. C., 1970: b-adjacent Error Correction, IBM J. Ramp;D, V. 14, pp. 402–408.
Bossen, D. C., Hong, S. J., 1971: Cause Effect Analysis for Multiple Fault Detection in Combinational Networks, IEEE TC, C-20, No. 11, pp. 1252–1263.
Bossen, D. C., Hsiao, M. Y., 1980: A System Solution to the Memory Soft Error Problem, IBM J. Ramp;D, No. 3, pp. 390–397.
Bossen, D. C., Hsiao, M. Y., 1982: Model for Transient and Permanent Error-detection Sc Fault-isolation Coverage, IBM J. J. Ramp;D, 26, No. 1, pp. 67–77.
Bouricius, W. G., 1953: Operating Experience with the Los Alamos 701, EJCC, pp. 45–47.
Bouricius, W. G., et al, 1967: Investigations in the Design of an Automatically Repaired Computer, Dig. 1st IEEE Comp. Conf.
Bouricius, W. G., Carter, W. C., Schneider, P. R., 1969: Reliability Modeling Techniques for Self-Repairing Computer Systems, Proc. ACM Ann. Conf., pp. 295–309
Bouricius, W. G., et al, 1971: Algorithms for Detection of Faults in Logic Circuits, IEEE TC, C-20, pp. 1258–1264.
Bouricius, W. G., Carter, W. C., Roth, J. P., Schneider, P. R., 1972: US Patent No. 3,665,173; Triple Modular Redundancy/Sparing.
Bouricius, W. G., Carter, W. C., Roth, J. P., Schneider, P. R., 1972: US Patent No. 3,665,174; Error Tolerant ALU.
Bouricius, W. G., Carter, W. C., Roth, J. P., Schneider, P. R., 1972: US Patent No. 3,665,175; Dynamic Storage Address Blocking to Achieve Error Toleration in Addressing Circuitry.
Bouricius, W. G., Carter, W. C., Roth, J. P., Schneider, P. R., 1972: US Patent No. 3,665,418; Status Switching in an Automatically Repaired Computer.
Bucholz, W., 1953: The System Design of The IBM 701 Computer, Proc. IRE, 41, pp. 1262–1275.
Bucholz, W., Ed., 1962: Planning a Computer System ( Project Stretch ), McGraw Hill.
Burks, Burks, A. R., 1981: The ENIAC: First General Purpose Electronic Computer, Annals Hist. Comp., Y. 3, No. 4 pp. 310–399.
Burnstine, D. C., Eppard, W. H., 1966: Maintenance Strategy Diagramming Technique, 1966 Annual Symp. on Rei. pp. 75–83.
Carter, W. C., Mekota, J. E., 1954: Panel Discussion, Redundancy Checking for Small Digital Computers, EJCC, pp. 56–57.
Carter, W. C., 1957: A New Large Scale Data Handling System-DATAmatic 1000, ACM Symp. New Computers, A Report from the Manufacturers, pp. 36–57.
Carter, W. C., 1958: Automatic Machine and Program Testing Routines, 5th Annual Symp. on Comp. Sc Data Processing, U. Colorado, Boulder, Colorado.
Carter, W. C., et al, 1964: Design of Serviceability Features for the IBM System/360, IBM J Ramp;D, V. 8, No. 4, pp. 115–126.
Carter, W. C., Schneider P. R., 1968: Design of Dynamically Checked Computer Systems, Inf. Proc. 68, IFIPS, pp. 878–883.
Carter, W. C., Jessep, D. C., Wadia, A. B., 1970a: Error-Free Decoding for Failure Tolerant Memories, Proc. 1st IEEE Comp. Group Conf. pp. 25–30.
Carter, W. C., et al, 1970b: Design Techniques for MARCS (Modular Architecture for Reliable Computer Systems), IBM RAI2.
Carter, W. C., Bouricius W. G., 1971a: A Survey of Fault Tolerant Architecture and Its Evaluation, COMPUTER, Jan., pp. 10–16 (See Related Fault Tolerance papers in the issue).
Carter, W. C., Wadia, A. B., Jessep, D. C., 1971b: Implementation of Checkable Acyclic Automata by Morphic Boolean Functions, Pr. Smp. Cmp. Sc Auto. Poly. Tech. Inst. Brooklyn, pp. 466–482.
Carter, W. C., et al, 1971c: Logic Design for Dynamic and Interactive Recovery, IEEE TC, C-20, pp. 1300–1306.
Carter, W. C., Hsieh, E. P., Wadia, A. B., 1973: US Patent No. 3,766,521; Multiple b-Adjacent Group Correction and Detection Codes and Self-Checking Translators Therefor.
Carter, W. C., McCarthy, C. E., 1976: Implementation of an Experimental Fault-Tolerant Memory System, IEEE TC, pp. 557–568.
Carter, W. C., et al, 1977: Cost Effectiveness of Self-Checking Computer Design, Proc. FTCS-7, pp. 117–123.
Carter, W. C., Wadia, A. B., 1980: Design and Analysis of Codes and Their Self-checking Circuit Implementations for Correction and Detection of Multiple b-adjacent Errors, Proc. FTCS-10, pp. 35–40.
Carter, W. C., 1985: Chapter in Resilient Computing Systems, T. Anderson, Ed.
Chang, H. Y., E. Manning, Metze, G., 1970: Fault Diagnosis of Digital Systems, Wiley-Interscience, N. Y.
Chen, C. L., Hsiao, M. Y., 1984: Error-Correction Codes for Semiconductor Memory Applications: A State-of-the-Art Review, IBM J. Ramp;D, 28, No. 2, pp. 124–134.
Clippinger, R. F., et al, 1953: The Programming of Stored Program Computers, SIAM Journal, V. l, Nos. 1,2, 3.
Cooper, A. E., Chow, W. T., 1976: Development of Onboard Space Computers, IBM J. Ramp;D, 20, pp. 5–19.
Creveling, C. J., 1956: Increasing the Reliability of Electronic Equipment by the Use of Redundant Circuits, Proc. IRE, V. 44, pp. 509–515.
Davies, C. T., 1973: Recovery Semantics for a DB/DC System, Proc. ACM Annual Conf. pp. 136–144.
Davis, D. J., 1952: An Analysis of Failure Data, J. Am. Stat. Soc., No. 5, pp. 104–135.
Davis, M. E., 1983: Use of the Electronic Data-Processing Systems in the Life Insurance Business, EJCC, pp. 11–17.
Dickinson, M. M., et al, 1964: Saturn V Launch Vehicle Digital Computer Sc Adapter, FJCC, V. 26, pp. 501–516.
Eachus, J. J., 1953: Group Discussion on Diagnostic Checks, EJCC, p. 119.
Eichelberger, E. B.,Williams, T. J., 1977: A Logic Design Structure for LSI Testability, Proc. D. A. Conf. pp. 462–468.
Eldred, R. D., 1959: Test Routines Based on Symbolic Logic Statements, J ACM, V. 6, No. 1, pp. 33–36
Epstein, B amp; M. Sobol,1953: Life Testing, Journal of American Statistical Assoc., V. 48, No. 263, pp. 486–502.
E. R. A., 1950: High Speed Computing Devices, McGraw-Hill.
Estrin, G., 1953: The Electronic Computer at the Institute for Advanced Study, MTAC, 7, pp. 108–110.
Everett, R. R., et al, 1957: SAGE-A Data-Processing System for Air Defense, EJCC, pp. 148–155.
Falkoff, A. D., et al, 1964: A Formal Description of System/360 IBM Sys. J. V. 3, No. 3, pp. 193–262.
Fitzsimons, R. M., 1972: TRIDENT-A New Maintenance Weapon, Proc. FJCC, 41, pp. 255–267.
Flehinger, B. J., 1958: Reliability Improvement through Redundancy at Various System Levels, IBM J Ramp;D, pp. 223–245.
Forbes, R. E., et al, 1965: A Self-Diagnosable Computer, FJCC, V. 27, Part 1, pp. 1073–1087.
Forrester, J. W., 1951: Digital Information Storage in Three Dimensions Using Magnetic Cores, J. Ap. Physics, pp. 44–48.
Fox, J. L., 1975: Availability Design of the S/370 Model 168 Multiprocessor, Proc. 2nd USA-Japan Comp. Conf. pp. 52–57.
Franaszek, P. E., 1972: US Patent No. 3,689,899; Run-length-limited Variable Length Coding with Error Propagation Limitation.
Gluck, S., 1965: Impact of Scratchpads in Design: Multifunctional Scratchpad Memories in the Burroughs B8500, FJCC, pp. 661–667.
Goel, P., 1980: An Implicit Enumeration Algorithm to Generate Tests for Combinational Logic Circuits, FTCS-10, pp. 145–151.
Goldberg, J., et al, 1972: Survey of Fault Tolerant Computing Systems, SRI Inc. Report.
Goldstine, H. H., von Neumann, J., 1947: Planning amp; Coding of Problems for an Electronic Computing Instrument, Inst, of Advanced Study, Princeton.
Griesmer, J. H., R. E. Miller, Roth, J. P., 1962: The Design of Digital Circuits to Eliminate Catastrophic Failures, Redundancy Tech. for Comp. Sys., Spartan Books.
Hackl, F. J., Shirk, R. W., 1965: An Integrated Approach to Automated Computer Maintenance, IEEE Conf. Ree. on Switching Theory Sc Logic Des., pp. 289–302.
Hamming, R. W.,1953: Error Detecting Sc Error Correcting Codes, BSTJ, 29, pp. 147–160.
Hardie, F., Suhocki, R. S., 1967: Design Se Use of Fault Simulation for Saturn Computer Design, IEEE TC, pp. 412–429.
Harrison, T. J., et al, 1981: Evolution of Small Real-Time IBM Computer Systems, IBM J. Ramp;D, V. 25, pp. 441–453.
Harvard Proc., 1949: Proc. of a 2nd Symp. on Large Scale Digital Calculating Machinery, Annals Comp. Lab., V. X XII.
Hsiao, M. Y., 1970: A Class of Optimal Minimum Odd-Weight-Column SEC/DED codes, IBM J Ramp;D, V. 14, pp. 395–403.
Hsiao, M. Y., et al, 1981: Reliability, Availability, amp; Serviceability of IBM Computer Systems: A Quarter Century of Progress, IBM J Ramp;D, V. 25, pp. 453–465.
Ibarra, O. H., Sahni, S. J., 1975: Polynomially Complete Fault Detection Problems, IEEE TC, C-24, pp. 242–253.
James, S. E., 1981: Evolution of Real-Time Computer Systems for Manned Spaceflight, IBM J. Ramp;D, V. 25, pp. 417–429.
Jarema, D. R., Sussenguth, E. H., 1981: IBM Data Communications: A Quarter Century of Evolution amp; Progress, IBM J. Ramp;D, V. 25, pp. 391–405.
Keeler, J., 1967: Special Issue on IBM 9020, IBM Sys. J. 6, No. 2.
Kopp, R., 1953: Experience with the Air Force UNIVAC, EJCC, pp. 62–67.
Lancto, D. C., Rockefeller, R. L. 1967: The Operational Error Analysis Program, IBM Sys. J., 6, No. 2, pp. 103–149.
Laprie, J. -C., 1985: Dependable Computing amp; Fault Tolerance: Concepts amp; Terminology, Proc. FTCS-15, pp. 2–14.
Mauchly, J. W., 1953: The Advantages of Built-in Checking, EJCC,pp, 99–101.
Metropolis, N., Worlton, J., 1980: A Trilogy on Errors in Computing History, Annals Hist. Comp., V. 2, No. 1, pp. 49–59. (Excellent list of 93 references).
Moore, E. F., 1956: Gedanken-Experiments on Sequential Machines, Automata Studies, Princeton, pp. 129–156.
Moore, E. F., Shannon, C. E., 1956: Reliable Circuits Using Less Reliable Relays, J. Franklin Inst., pp. 191–208; 281–297.
Murray, F. J., 1953: Acceptance Test for the Raytheon Hurricane Computer, EJCC, pp. 48–52 (RAYDAC).
No. 1 ESS Issues, 1964: BSTJ, V. 43, No. 5, pp. 1831–2610.
OSVS2, 1985: 0SVS2 MVS Overview, No. GC20-0954-0, IBM Brnch Of.
Patel, A. M., Hong, S. J., 1974: Optimal Rectangular Code for High Density Magnetic Tapes, IBM J. Ramp;D, 18, pp. 579–588.
Perry, M. N., Plügge, W. P., 1961: American Airlines ‘SABRE’ Electronics Reservation System, WJCC, pp. 563–601.
Peterson, W. W., 1961: Error Correcting Codes, MIT Press.
Preiss, R. J., 1965: The Use of Fault Location Tests in Prototype Bring-up, Proc. IFIP65, pp. 511–517.
Preiss, R. J., 1972: Design Automation of Digital Systems, M. E. Breuer, Ed., V. 1, pp. 335–410.
Preparata, F. P., Metze, G., Chien, R. T., 1967: On the Connection Assignment Problem of Diagnosable Systems, IEEE TC, C-16, No. 6, pp. 848–854.
Proceedings of the ACM Conference, 1952: Pittsburgh, Pa. Several papers on the History of Computing, pp. 1–32.
Putzulu, G. R., Roth, J. P., 1971: An Heuristic Algorithm for the Testing of Asynchronous Circuits, IEEE TC, pp. 639–648.
Ralston, A., 1976: The Encyclopedia of Computer Science, McGraw-Hill, N. Y.
Randell, B., (Ed.) 1973: The Origins of Digital Computers, Springer-Verlag.
Randell, B., 1981: Comments on Burks, A. W., 1981.
Raymond, G. A., 1958: A Transistor-Circuit Chassis for High Reliability in Missle Guidance Systems, EJCC, pp. 132–135.
Reed, I. S., 1954: A Class of Multiple-error-correcting Codes and Their Decoding Scheme, Trans. IRE, IT-4, pp. 38–40.
Roth, J. P., 1966: Diagnosis of Automata Failures: A Calculus and a Method, IBM J. Ramp;D, pp. 278–291.
Roth, J. P., Bouricius, W. G., Carter, W. C., Schneider, P. R., 1967: Phase II of an Architectural Study for a Self-Repairing Computer, SAMSO TR-67–106.
Schneider, P. R., 1967: On the Necessity to Examine D-Chains in Diagnostic Test Generation-An Example, IBM J. Ramp;D, pp. 114.
Sellers, F. F., Hsiao, M. Y., Bearnson, L. W., 1968a: Error Detecting Logic for Digital Computers, McGraw-Hill, N. Y.
Sellers, F. F., Hsiao, M. Y., Bearnson, L. W., 1968b: Analyzing Errors with the Boolean Difference, IEEE TC, pp. 676–683.
Shannon, C., 1938: A Symbolic Analysis of Relay amp; Switching Circuits, AIEE Trans., 57, pp. 713–723.
Shepe, P. D. Jr., Kirsch, R. A., 1953: SEAC-Review of Three Years of Operation, EJCC, pp. 83–90.
Shiowitz, M., et al, 1956: Functional Description of the NCR 304 Data Processing System for Business Applications, EJCC, pp. 34–39.
Smith, J. E., Lam, P., 1983: A Theory of Totally Self-Checking System Design, IEEE TC, pp. 491–499.
Snyder, S. S., 1980: Computer Advances Pioneered by Cryptologic Organizations, Ann. Hist. Comp., V. 2, No. 1, pp. 60–71.
Stanga, D. C., 1967: UNIVAC 1108 Multiprocessor System, AFIPS, SJCC, pp. 45–51.
Tryon, J. G., 1962: Quadded Logic, Redundancy Techniques for Computing Systems, Spartan Books, pp. 205–228.
von Neumann, J., 1956: Probabilistic Logics Sc the Synthesis of Reliable Organisms from Unreliable Components, Automata Studies, Princeton, pp. 43–97.
Wadia, A. B., 1970: Investigation into the Design of Dynamically Checked Arithmetic Units, Ph. D. Thesis, Harvard.
Walters, L. R., 1953: Diagnostic Programming Techniques for the IBM Type 701 E. D. P. M., Conv. Rec., IRE Nat. Convention.
Weik, M. H., 1955: A Survey of Domestic Electronics Digital Computing Systems, BRL, Rpt. No. 971, Aberdeen Proving Ground, Md.
Weik, M. H., 1957: A 2nd Survey of Domestic Electronics Digital Computing Systems, BRL, Rpt. No. 971, Aberdeen Proving Ground, Md.
Weik, M. H., 1961: A 3rd Survey of Domestic Electronics Digital Computing Systems, BRL, Rpt. No. 971, Aberdeen Proving Ground, Md.
Weir, J. M., 1953: Reliability Sc Characteristics of the ILLIAC Electrostatic Memory, EJCC, pp. 72–77.
Wheeler, D. J., Robertson, J. E., 1953: Diagnostic Programs for the ILLIAC, Proc. IRE, V. 41, pp. 1320–1325.
Whitelock, L. D., 1953: Methods Used to Improve Reliability in Military Electronics Equipment, EJCC, pp. 31–33.
Wilkes, M. V., Wheeler, D. J., Gill, S., 1951: The Preparing of Programs for an Electronic Digital Computer, Addison-Wesley.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Springer-Verlag/Wien
About this paper
Cite this paper
Carter, W.C. (1987). Experiences in Fault Tolerant Computing, 1947 – 1971. In: Avižienis, A., Kopetz, H., Laprie, JC. (eds) The Evolution of Fault-Tolerant Computing. Dependable Computing and Fault-Tolerant Systems, vol 1. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8871-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-7091-8871-2_1
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-8873-6
Online ISBN: 978-3-7091-8871-2
eBook Packages: Springer Book Archive