Advertisement

Experimental Research in Reliable Computing at Carnegie Mellon University

  • Daniel P. Siewiorek
  • John P. Shen
  • Roy A. Maxion
Conference paper
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 1)

Abstract

In 1945 the Carnegie Plan for higher education was evolved. The basic philosophy of the plan is “learning by doing”. The strong emphasis on experimental research at Carnegie Mellon University (CMU) is an example of the Carnegie plan in operation. Research in reliable computing at Carnegie Mellon University has spanned three decades. In the early 1960’s, Westinghouse Corporation in Pittsburgh had an active research program in the use of redundancy to enhance system reliability. William Mann, who had been associated with Carnegie Mellon University, was one of those researchers. In 1962, a symposium on redundancy techniques was held in Washington, DC.; it lead to the first comprehensive book on the topic of redundancy and reliability. Bill Mann was one of the coauthors of that book [73]. One of the papers in that volume, on adaptive voting, was written by CMU’s Professor William H. Pierce [41]. Later Professor Pierce published one of the first text books on redundancy [42].

Keywords

Transient Fault Reliable Computing Transient Error Crossbar Switch Decrease Failure Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Accetta, M., R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young, Mach: A New Kernel Foundation for UNIX Development, In Proceedings of Summer Usenix, Atlanta, July, 1986.Google Scholar
  2. [2]
    Barbacci, M., instruction Set Processor Specifications (ISPS): The Notation and Its Application, In IEEE Transactions on Computers, C-30, nr 1, pp. 24–40, January, 1981.Google Scholar
  3. [3]
    Barbacci, M., G. Barnes, R. Cattell, and D. P. Siewiorek, 77ie ISPS Computer Description Language, Carnegie Mellon University, Department of Computer Science Technical Report, 1977.Google Scholar
  4. [4]
    Bell, C. G. and A. Newell, Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971.Google Scholar
  5. [5]
    Bhandarkar, D. P., Analytic Models for Memory Interference in Multiprocess Computer Systems, PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania, September, 1972.Google Scholar
  6. [6]
    Bloch, J. J., D. S. Daniels, and A. Z. Spector, Weighted Voting for Directories: A Comprehensive Study, Technical Report CMU-CS-84–114, April, 1984.Google Scholar
  7. [7]
    Castillo, X., A Compatible Hardware /Software Reliability Prediction Model, PhD thesis, Carnegie Mellon University, July 1981, Also Department of Computer Science Technical Report.Google Scholar
  8. [8]
    Castillo, X. and D. P. Siewiorek, A Workload Dependent Software Reliability Prediction Model, In 12th International Fault Tolerant Computing Symposium, June, 1982.Google Scholar
  9. [9]
    Castillo, X., S. R. McConnel, and D. P. Siewiorek, Derivation and Calibration of a Transient Error Reliability Model, IEEE Transactions on Computers, C-31, nr. 7, pp. 658–671, July 1982Google Scholar
  10. [10]
    Clune, E., Z. Segall and D. P. Siewiorek, Validation of Fault-Free Behavior of a Reliable Multiprocessor System, FTMP: A Case Study, In International Conference on Automation and Computer Control, San Diego, CA, June 6–8, 1984Google Scholar
  11. [11]
    Clune, E., Z. Segall, and D. Siewiorek, FIFault-Free Behavior of Reliable Multiprocessor Systems: FTMP Experiments in AIR-LAB, NASA Contractor Report 177967, Grant NAG1–190, Carnegie Mellon University, August 1985.Google Scholar
  12. [12]
    Czeck, E. W., F. E. Feather, A. M. Grizzaffi, G. B. Finelli, Z. Z. Segall, and D. P. Siewiorek, Fault-Free Performance Validation of Avionic Multiprocessors, In 7th Digital Avionic Systems Conference, October 1986, Dallas, TexasGoogle Scholar
  13. [13]
    Daniels, D. S. and A. Z. Spector, An Algorithm for Replicated Directories, In Proceedings of the Second Annual Symposium on Principles of Distributed Computing, pp. 104–113, August 1983, Montreal, Canada.Google Scholar
  14. [14]
    Eifert, J. and J. P. Shen, Processor Monitoring Using Asynchronous Signatured instruction Streams, In Proceedings of 14th International Fault-Tolerant Computing Symposium, June 1984.Google Scholar
  15. [15]
    Elkind, S. A., LAMBDA User Manual, Carnegie Mellon University, 1983.Google Scholar
  16. [16]
    Elkind, S. A., Approaches to Reliable Systems Design, PhD thesis, Carnegie Mellon University, May 1985Google Scholar
  17. [17]
    Elkind, S. and D. P. Siewiorek, Reliability and Performance of Error-Correcting Memory and Register Arrays, In IEEE Transactions on Computers, vol. 29, nr. 10, pp. 920–927, October 1980.Google Scholar
  18. [18]
    Feather, F., D. Siewiorek, and Z. Segall, Validation of a Fault-Tolerant Multiprocessor: Baseline Experiments and Workload Implementation, Technical Report CMU-CS-85–145, Carnegie Mellon University, July 1985.Google Scholar
  19. [19]
    Feather, Frank, Daniel P. Siewiorek, and Zary Segall, Validation of a Fault-Tolerant Multiprocessor: Baseline Experiments and Workload Implementation, Technical Report CMU-CS-8 5–127, Carnegie Mellon University, July 1985.Google Scholar
  20. [20]
    Ferguson, F. J. and J. P. Shen, The Design of Thvo Easily-Testable Array Multipliers, In Proceedings of the 6th Computer Arithmetic Conference, June 1983.Google Scholar
  21. [21]
    Gehringer, E. F., D. P. Siewiorek, and Z. Segall, Parallel Processing: The Cm*Experience, Digital Press, Bedford MA, 1987.Google Scholar
  22. [22]
    Guise, D., D. P. Siewiorek, and W. P. Birmingham, DEMETER: A Design Methodology and Environment, In Proceedings of the IEEE International Conference on Computer Design/VLSI in Computers, 1983.Google Scholar
  23. [23]
    Kini, V., Automatic Generation of Reliability Functions for Processor-Memory-Switch Structures, PhD thesis, Carnegie Mellon University, Department of Electrical Engineering, February 1981.Google Scholar
  24. [24]
    Kini, V. and D. P. Siewiorek, Automatic Generation of Symbolic Reliability Functions for Processor-Memory-Switch Structures, IEEE Transactions on Computers, vol. C-31, nr. 8, pp. 752–771, August 1982.CrossRefGoogle Scholar
  25. [25]
    Lai, K. W., Functional Testing of Digital Systems, PhD thesis, Carnegie Mellon University, Department of Computer Science, 1981Google Scholar
  26. [26]
    Lai, K. W. and D. P. Siewiorek, Functional Testing of Digital Systems, In 20th Design Automation Conference Proceedings, Miami Beach, FL, June 27–29 1983.Google Scholar
  27. [27]
    Lee, D. C. H. and J. P. Shen, Easily-Testable (N,K) Shuffle-Exchange Networks, In Proceedings of International Conference on Parallel Processing, August 1983.Google Scholar
  28. [28]
    Lin, T-T. Y. and D. P. Siewiorek, Architectural Issues for On-Line Diagnostics in a Distributed Environment, In IEEE International Conference on Computer Design, Port Chester NY, October 1986.Google Scholar
  29. [29]
    Maly, W., F. J. Ferguson, and J. P. Shen, Systematic Characterization of Physical Defects for Fault Analysis of MOS IC Cells, In Proceedings of International Test Conference, October 1984.Google Scholar
  30. [30]
    Mashburn, H. H., The C. rrvmp/Hydra Project: An Architectural Overview, In Siewiorek, D. P., C. G. Bell, and A. Newell, Computer Structures: Principles and Examples, pp. 350–370, McGraw-Hill, New York 1982.Google Scholar
  31. [31]
    Maxion, R. A., Distributed Diagnostic Performance Reporting and Analysis, In IEEE International Conference on Computer Design, Port Chester NY, October 1986.Google Scholar
  32. [32]
    Maxion, R. A., Toward Fault-Tolerant User interfaces, In Proceedings of the 5th IFAC International Conference on Achieving Safe Real-Time Computing Systems (SAFECOMP-86), Sarlat, France, October 1986.Google Scholar
  33. [33]
    Maxion, R. A., Human and Machine Diagnosis of Computer Hardware Faults, IEEE Computer Society Workshop on Reliability of Local Area Networks, South Padre Island, Texas, February 1982.Google Scholar
  34. [34]
    McConnel, S. R., Analysis and Modeling of Transient Errors in Digital Computers, PhD thesis, Carnegie Mellon University, Department of Electrical Engineering, June 1981, Also Department of Computer Science Technical Report.Google Scholar
  35. [35]
    McConnel S. R. and D. P. Siewiorek, The CMU Voter Chip, Technical Report Carnegie Mellon University, Department of Computer Science, 1980Google Scholar
  36. [36]
    McConnel, S. R. and D. P. Siewiorek, Synchronization and Voting, In IEEE Transactions on Computers, vol. C-30, nr. 2, pp. 161–164, February 1981.Google Scholar
  37. [37]
    McConnel, S. R., D. P. Siewiorek and M. M. Tsao, Transient Error Data Analysis, Technical Report, Carnegie Mellon University, Department of Computer Science, May 1979Google Scholar
  38. [38]
    NASA-Langley Research Center, Validation Methods for Fault-Tolerant Avionics and Control Systems-Working Group Meeting /, NASA Conference Publication 2114, Research Triangle Institute, 1979.Google Scholar
  39. [39]
    NASA-Langley Research Center, Validation Methods for Fault-Tolerant Avionics and Control Systems-Working Group Meeting //, NASA Conference Publication 2130, Research Triangle Institute, 1979.Google Scholar
  40. [40]
    Perq System Overview, March Edition, Perq Systems Corporation, Pittsburgh, Pennsylvania, 1984.Google Scholar
  41. [41]
    Pierce, W. H., Adaptive Vote-Takers Improve the Use of Redundancy, In Wilcox R. H. and W. C. Mann, Redundancy Techniques for Computing Systems, pp. 229–250, Spartan Books, Washington, D. C. 1962.Google Scholar
  42. [42]
    Pierce, W. H., Failure Tolerant Design, Academic Press, New York 1965.Google Scholar
  43. [43]
    Rashid, R. and G. G. Robertson, Accent: A Communication-Oriented Network Operating System Kernel, Computer Science Department Technical Report, Carnegie Mellon University, 1981.Google Scholar
  44. [44]
    Robinson, S. H. and J. P. Shen, Switch-Level Automatic Test Generation for CMOS Circuits, In Proceedings of International Conference on Computer-Aided Design, November 1985.Google Scholar
  45. [45]
    Schuette, M. A., J. P. Shen, D. P. Siewiorek and Y. X. Zhu, An Experimental Evaluation of Two Concurrent Error Detection Approaches, In Proceedings of 16th International Fault-Tolerant Computing Symposium, July 1986.Google Scholar
  46. [46]
    Schwarz, P. M., Transactions on Typed Objects, PhD thesis, Computer Science Department, Carnegie Mellon University, December 1984, Available as Technical Report CMU-CS-84–166, Carnegie Mellon UniversityGoogle Scholar
  47. [47]
    Shen, J. P. and F. J. Ferguson, Easily-Testable Array Multipliers, In Proceedings of 13th International Fault-Tolerant Computing Symposium, June 1983.Google Scholar
  48. [48]
    Shen, J. P. and F. J. Ferguson, The Design of Easily-Testable VLSI Array Multipliers, In IEEE Transactions on Computers, June 1984.Google Scholar
  49. [49]
    Shen, J. P. and M. A. Schuette, On-Line Seif-Monitoring Using Signatured instruction Streams, hi Proceedings of international Test Conference, October 1983.Google Scholar
  50. [50]
    Shen, J. P. and M. A. Schuette, Processor Control Flow Monitoring Using Signatured Instruction Streams, IEEE Transactions on Computers, 1986.Google Scholar
  51. [51]
    Shen, J. P. and S. P. Tomas, A Roving Monitoring Processor for Detection of Control Flow Errors in Multiple Processor Systems, In Microprocessing and Microprogramming: The Euromicro Journal, 1986.Google Scholar
  52. [52]
    Shen, J. P., W. Maly, and F. J. Ferguson, inductive Fault Analysis of MOS Integrated Circuits, In IEEE Design and Test of Computers, December 1985.Google Scholar
  53. [53]
    Shombert, L., The C. vmp Statistics Board Experiment, Master’s thesis, Carnegie Mellon University, Department of Electrical Engineering, 1981.Google Scholar
  54. [54]
    Shombert, L. A., Using Redundancy for Testable and Repairable Systolic Arrays, PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania, 1985.Google Scholar
  55. [55]
    Siewiorek, D. P., Architecture of Fault-Tolerant Computers, In Computer, vol. 17, nr. 8, pp. 9–18, August 1984.Google Scholar
  56. [56]
    Siewiorek, D. P., Architecture of Fault-Tolerant Computers, In D. K. Pradhan, Fault-Tolerant Computing: Theory and Techniques, Vol. II, pp. 417–466, Prentice-Hall, Englewood Cliffs, N. J., 1986.Google Scholar
  57. [57]
    Siewiorek, D. P. and K. W. Lai, Testing of Digital Systems, In Proceedings of the IEEE, pp. 1321–1333, October 1981.Google Scholar
  58. [58]
    Siewiorek, D. P. and S. R. McConnel, C. vmp: The Implementation, Performance, and Reliability of a Fault-Tolerant Multiprocessor, In Proceedings of the Third US-Japan Computer Conference, October 1978.Google Scholar
  59. [59]
    Siewiorek, D. P. and R. S. Swarz, The Theory and Practice of Reliable System Design, Digital Press, Bedford MA, 1982.Google Scholar
  60. [60]
    Siewiorek, D., V. Kini, R. Joobbani, and H. Bellis, A Case Study of C. mmp, Cm* and C. vmp: Part II. Predicting and Calibrating Reliability of Multiprocessor Systems, In Proceedings of the IEEE, vol. 66, nr. 10, pp. 1200–1220, October 1978.Google Scholar
  61. [61]
    Siewiorek, D. P., C. G. Bell, R. C. Chen, S. H. Fuller, J. Grason, and S. Rege, The Architecture and Applications of Computer Modules: A Set of Components for Digital Systems Design, In Proceedings of the 1973 COMPCON Conference, pp. 177–180, San Francisco, CA, February 1973.Google Scholar
  62. [62]
    Siewiorek, D. P., V. Kini, H. Mashburn, S. McConnel and M. Tsao, A Case Study of C. mmp, Cm* C. vmp: Part I-Experiences with Fault Tolerance in Multiprocessor Systems, In Proceedings of the IEEE, vol. 66, pp. 1178–1199, October 1978Google Scholar
  63. [63]
    Siewiorek, D. P., M. Canepa, and S. Clark, C. vmp: The Architecture and Implementation of a Fault-Tolerant Multiprocessor, In 7th International Symposium on Fault-Tolerant Computing, Los Angeles CA, June 1977.Google Scholar
  64. [64]
    Spector, A. Z., J. Butcher, D. S. Daniels, D. J. Duchamp, J. L. Eppinger, C. E. Fineman, A. Heddaya, P. M. Schwarz, Support for Distributed Transactions in the TABS Prototype, In IEEE Transactions on Software Engineering, vol. SE-11, nr. 6, pp. 520–530, June 1985.Google Scholar
  65. [65]
    Spector, A. Z., D. S. Daniels, D. J. Duchamp, J. L. Eppinger, R. Pausch, Distributed Transactions for Reliable Systems, Proceedings of the Tenth Symposium on Operating System Principles, December 1985.Google Scholar
  66. [66]
    Swan, R. J., S. H. Fuller, D. P. Siewiorek, Cm*: A Modular, Multi-Microprocessor, In AFIPS: Proceedings of the National Computer Conference, June 1977.Google Scholar
  67. [67]
    Tomas, S. P. and J. P. Shen, A Roving Monitoring Processor for Detection of Control Flow Errors in Multiple Processor Systems, In Proceedings of the International Conference on Computer Design, October 1985.Google Scholar
  68. [68]
    W. N. Toy, Fault-Tolerant Design of Local ESS Processors, In Proceedings of the IEEE, vol. 66, nr. 10, pp. 1126–1145, October 1978.CrossRefGoogle Scholar
  69. [69]
    Tsao, M. M., Transient Error and Fault Prediction, PhD thesis, Carnegie Mellon University, Department of Electrical Engineering, January 1981.Google Scholar
  70. [70]
    Tsao, M. M. and D. P. Siewiorek, Trend Analysis on System Fhror Files FP, In 13th international Fault Tolerant Computing Symposium, June 1983, Milan, Italy.Google Scholar
  71. [71]
    Tsao, M. M., A. W. Wilson, R. C. McGarity, C-J. Tseng and D. P. Siewiorek, The Design of C. Fast: A Single Chip Fault-Tolerant Microprocessor, In 12th International Fault-Tolerant Computing Symposium, June 1982, Santa Monica, CA.Google Scholar
  72. [72]
    U. S. Department of Defense, Military Standardization Handbook: Reliability Prediction of Electronic Equipment, MIL-STD-HDBK-217B, Notice 1, 1976.Google Scholar
  73. [73]
    Wilcox, R. C. and W. C. Mann, Redundancy Techniques for Computer Systems, Spartan Books, Washington, D. C., 1962Google Scholar
  74. [74]
    Wulf, W. A. and C. G. Bell, C. mmp: A Multi-Mini-Processor, In AFIPS Conference Proceedings, vol. 41, pp. 765–777, Montvale, NJ., 1972.Google Scholar
  75. [75]
    Wulf, W. A., R. Levin, and S. Harbison, Hydra/C. mmp: An Experimental Computer System, McGraw-Hill, New York, New York, 1980.Google Scholar
  76. [76]
    York, G., D. P. Siewiorek, Y. X. Zhu, Compensating Faults in Triple Modular Redundancy, In Proceedings of the Fifteenth International Symposium on Fault-Tolerant Computing, June 19–21 1985.Google Scholar

Copyright information

© Springer-Verlag/Wien 1987

Authors and Affiliations

  • Daniel P. Siewiorek
    • 1
    • 2
  • John P. Shen
    • 2
  • Roy A. Maxion
    • 1
  1. 1.Department of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations