Skip to main content

Continual On-Line Diagnosis of Hybrid Faults

  • Conference paper

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 9))

Abstract

An accurate system-state determination is essential in ensuring system dependability. An imprecise state assessment can lead to catastrophic failure through optimistic diagnosis, or underutilization of resources due to pessimistic diagnosis. Dependability is usually achieved through a fault detection, isolation and reconfiguration (FDIR) paradigm, of which the diagnosis procedure is a primary component. Fault resolution in on-line diagnosis is key to providing an accurate system-state assessment. Most diagnostic strategies are based on limited fault models that adopt either an optimistic (all faults s-a-X) or pessimistic (all faults Byzantine) bias. Our Hybrid Fault-Effects Model (HFM) handles a continuum of fault types that are distinguished by their impact on system operations. While this approach has been shown to enhance system functionality and dependability, on-line FDIR is required to make the HFM practical. In this paper, we develop a methodology for utilization of the system-state information to provide continual on-line diagnosis and reconfiguration as an integral part of the system operations. We present diagnosis algorithms applicable under the generalized HFM and introduce the notion of fault decay time. Our diagnosis approach is based primarily on monitoring the system’s message traffic. Unlike existing approaches, no explicit test procedures are required.

Work supported in part by ONR # N00014-91-C-0014

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Barborak et al. The consensus problem in fault-tolerant computing. ACM Computing surveys, 25(2):171–220, June 1993.

    Article  Google Scholar 

  2. L. Lamport et al. The byzantine generals problem. ACM Trans. on Prog. Languages and Systems, 4:382–401, July 1982.

    Article  MATH  Google Scholar 

  3. P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. FTCS-23, pages 402–411, 1993.

    Google Scholar 

  4. F. Preparata, G. Metze, and R. T. Chien. On the connection assignment problem of diagnosable systems. IEEE Trans. on Electronic Computing, ec-16:848–854, Dec 1967.

    Article  Google Scholar 

  5. N. Suri et al. Reliability modeling of large fault-tolerant systems. FTCS-22, pages 212–220, 1992.

    Google Scholar 

  6. A. Sengupta and A. Dahbura. On self-diagnosable multiprocessor systems: diagnosis by the comparison approach. IEEE TOC, 41(11):1386–1396, Nov 1992.

    MathSciNet  Google Scholar 

  7. K. G. Shin and P. Ramanathan. Diagnosis of processors with byzantine faults in a distributed computing system. FTCS-17, pages 55–60, June 1987.

    Google Scholar 

  8. P. Thambidurai and Y. K. Park. Interactive consistency with multiple failure modes. Proc. of RDS, pages 93-100, 1988.

    Google Scholar 

  9. C. J. Walter et al. MAFT: A multicomputer architecture for fault-tolerance in real-time control systems. RTSS, Dec 1985.

    Google Scholar 

  10. C. J. Walter. Identifying the cause of detected errors. FTCS-20, June 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag/Wien

About this paper

Cite this paper

Walter, C.J., Suri, N., Hugue, M.M. (1995). Continual On-Line Diagnosis of Hybrid Faults. In: Cristian, F., Le Lann, G., Lunt, T. (eds) Dependable Computing for Critical Applications 4. Dependable Computing and Fault-Tolerant Systems, vol 9. Springer, Vienna. https://doi.org/10.1007/978-3-7091-9396-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-9396-9_21

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-9398-3

  • Online ISBN: 978-3-7091-9396-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics