Skip to main content

On-Line Fault Monitoring

  • Chapter
On-Line Testing for VLSI

Part of the book series: Frontiers in Electronic Testing ((FRET,volume 11))

  • 173 Accesses

Abstract

Sequoia’s fault-tolerant computers were designed subject to some rather rigid constraints: No single hardware malfunction can generate an undetected error; an integrated circuit is a “black box” that can fail in arbitrary ways, affecting an arbitrary subset of input and output signals; faults can be transient or intermittent with arbitrary durations and repetition intervals. Moreover, the incremental hardware to be used to achieve these goals was to be kept to a minimum. The resulting computers do, to a very large extent, satisfy these constraints. To achieve this, a combination of fault-monitoring techniques was used, including: Bit and nibble error-correcting and error-detecting codes; byte parity codes with orthogonal partitioning; cyclic-residue codes on I/O data transfers; codes designed to protect against address counter overruns on I/O transfers; lossless control-signal compactors. The nature and rationale for these various fault monitors is described as well as the analytical and testing techniques used to estimate the resulting coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. I. P.A. Bernstein, “Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing,” Computer, pp. 37–45, Feb. 1988.

    Google Scholar 

  2. K.M. Chandy and C.V. Ramamoorthy, “Rollback and Recovery Strategies for Computer Programs,” IEEE Trans. on Computers, Vol. 21, No. 6, pp. 546–556, June 1972.

    Article  MathSciNet  MATH  Google Scholar 

  3. E.R. Berlekamp, “The Technology of Error-Control Codes,” Proc. of the IEEE, May 1980, Vol. 68, No. 5, pp. 564–593.

    Article  Google Scholar 

  4. B. Bose and T.R.N. Rao, “Theory of Unidirectional Error Correcting/Detecting Codes,”IEEE Trans. on Computers, Vol. C-31, No. 6, pp. 520–530, June 1982.

    Google Scholar 

  5. J.J. Metzner, “Convolutionally Encoded Memory Protection,” IEEE Trans. on Computers, Vol. C-31, No. 6, pp. 547–551, June 1982.

    Google Scholar 

  6. D.K. Pradhan, “A New Class of Error Correcting-Detecting Codes for Fault-Tolerant Computer Applications,” IEEE Trans. on Computers, Vol. C-29, No. 6, pp. 471–481, June 1980.

    Google Scholar 

  7. D.K. Pradhan and J.J. Stiffler, “Error Correcting Codes and Self-Checking Circuits,” Computer, Vol. 13, No. 3, pp. 27–37, March 1980.

    Article  Google Scholar 

  8. T.R.N. Rao, Error Control Coding for Arithmetic Processors, Academic Press, New York, 1974.

    Google Scholar 

  9. J.J. Stiffler, “Coding for Random Access Memories,” IEEE Trans. on Computers, Vol. C-27, No. 6, pp. 526–531, June 1978.

    Google Scholar 

  10. J.F. Wakerly, “Detection of Unidirectional Multiple Errors Using Low-Cost Arithmetic Codes,” IEEE Trans. on Computers, Vol. C-27, No. 4, pp. 302–308, April 1978.

    Google Scholar 

  11. R.W. Hamming, “Error Detecting and Correcting Codes,” Bell Syst. Tech. Journal, Vol. 29, pp. 147–160, 1950.

    Google Scholar 

  12. N.J.A. Sloane, “A Simple Description of an Error-Correcting Code for High-Density Magnetic Tape, ”Bell Syst. Tech. Journal, Vol. 55, No. 2, pp. 157–165, Feb. 1976.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Stiffler, J.J. (1998). On-Line Fault Monitoring. In: Nicolaidis, M., Zorian, Y., Pradan, D.K. (eds) On-Line Testing for VLSI. Frontiers in Electronic Testing, vol 11. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6069-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6069-9_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5033-8

  • Online ISBN: 978-1-4757-6069-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics