Continual On-Line Diagnosis of Hybrid Faults

Walter, C. J.; Suri, N.; Hugue, M. M.

doi:10.1007/978-3-7091-9396-9_21

Continual On-Line Diagnosis of Hybrid Faults

C. J. Walter⁶,
N. Suri⁶ &
M. M. Hugue⁶

Conference paper

87 Accesses
7 Citations

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 9))

Abstract

An accurate system-state determination is essential in ensuring system dependability. An imprecise state assessment can lead to catastrophic failure through optimistic diagnosis, or underutilization of resources due to pessimistic diagnosis. Dependability is usually achieved through a fault detection, isolation and reconfiguration (FDIR) paradigm, of which the diagnosis procedure is a primary component. Fault resolution in on-line diagnosis is key to providing an accurate system-state assessment. Most diagnostic strategies are based on limited fault models that adopt either an optimistic (all faults s-a-X) or pessimistic (all faults Byzantine) bias. Our Hybrid Fault-Effects Model (HFM) handles a continuum of fault types that are distinguished by their impact on system operations. While this approach has been shown to enhance system functionality and dependability, on-line FDIR is required to make the HFM practical. In this paper, we develop a methodology for utilization of the system-state information to provide continual on-line diagnosis and reconfiguration as an integral part of the system operations. We present diagnosis algorithms applicable under the generalized HFM and introduce the notion of fault decay time. Our diagnosis approach is based primarily on monitoring the system’s message traffic. Unlike existing approaches, no explicit test procedures are required.

Work supported in part by ONR # N00014-91-C-0014

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Barborak et al. The consensus problem in fault-tolerant computing. ACM Computing surveys, 25(2):171–220, June 1993.
Article Google Scholar
L. Lamport et al. The byzantine generals problem. ACM Trans. on Prog. Languages and Systems, 4:382–401, July 1982.
Article MATH Google Scholar
P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. FTCS-23, pages 402–411, 1993.
Google Scholar
F. Preparata, G. Metze, and R. T. Chien. On the connection assignment problem of diagnosable systems. IEEE Trans. on Electronic Computing, ec-16:848–854, Dec 1967.
Article Google Scholar
N. Suri et al. Reliability modeling of large fault-tolerant systems. FTCS-22, pages 212–220, 1992.
Google Scholar
A. Sengupta and A. Dahbura. On self-diagnosable multiprocessor systems: diagnosis by the comparison approach. IEEE TOC, 41(11):1386–1396, Nov 1992.
MathSciNet Google Scholar
K. G. Shin and P. Ramanathan. Diagnosis of processors with byzantine faults in a distributed computing system. FTCS-17, pages 55–60, June 1987.
Google Scholar
P. Thambidurai and Y. K. Park. Interactive consistency with multiple failure modes. Proc. of RDS, pages 93-100, 1988.
Google Scholar
C. J. Walter et al. MAFT: A multicomputer architecture for fault-tolerance in real-time control systems. RTSS, Dec 1985.
Google Scholar
C. J. Walter. Identifying the cause of detected errors. FTCS-20, June 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

AlliedSignal MTC, Columbia, Maryland, 21045, USA
C. J. Walter, N. Suri & M. M. Hugue

Authors

C. J. Walter
View author publications
You can also search for this author in PubMed Google Scholar
N. Suri
View author publications
You can also search for this author in PubMed Google Scholar
M. M. Hugue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of California, La Jolla, CA, 92093-0114, USA
Flaviu Cristian
INRIA, F-78150, Le Chesnay, France
Gerard Le Lann (Research Director) (Research Director)
ARPA/CSTO, Arlington, VA, 22203, USA
Teresa Lunt (Program Manager) (Program Manager)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walter, C.J., Suri, N., Hugue, M.M. (1995). Continual On-Line Diagnosis of Hybrid Faults. In: Cristian, F., Le Lann, G., Lunt, T. (eds) Dependable Computing for Critical Applications 4. Dependable Computing and Fault-Tolerant Systems, vol 9. Springer, Vienna. https://doi.org/10.1007/978-3-7091-9396-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-7091-9396-9_21
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-9398-3
Online ISBN: 978-3-7091-9396-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics