Skip to main content

Removal of all faulty nodes from a fault-tolerant service by means of distributed diagnosis with imperfect fault coverage

  • Session 9 System Level Diagnosis
  • Conference paper
  • First Online:
Dependable Computing — EDCC-2 (EDCC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1150))

Included in the following conference series:

  • 156 Accesses

Abstract

In general, offering a fault-tolerant service boils down to the execution of replicas of a service process on different nodes in a distributed system. The service is fault-tolerant in such a way, that, even if some of the nodes on which a replica of the service resides, behave maliciously, the service is still performed correctly. To be able to guarantee the correctness of a fault-tolerant service despite the presence of maliciously functioning nodes, it is of key importance that all faulty nodes are timely removed from this service. Faulty nodes are detected by tests performed by the nodes offering the service. In practice, tests always have an imperfect fault coverage. In this paper, a distributed diagnosis algorithm with imperfect tests is described, by means of which all detectably faulty nodes are removed from a fault-tolerant service. This may, however, inevitably, imply the removal of a number of correctly functioning nodes from the service too. The maximum number of correctly functioning nodes removed from the service by the algorithm is calculated. Finally, the minimally required number of nodes needed in a fault-tolerant service to perform this diagnosis algorithm is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Preparata, F., Metze, G., Chien, R., On the connection assignment of diagnosable systems, in: IEEE Transactions on Electronic Computing, EC-16, 6(Dec. 1967), pp.848–854.

    Google Scholar 

  2. Barborak, M., Malek, M., Dahbura, A., The consensus problem in fault tolerant computing, in: ACM Computing Surveys, Vol 25, 2(Jun. 1993), pp.171–220.

    Google Scholar 

  3. Blough, D.M., Sullivan, G.F., Mason G.M. Intermittent fault diagnosis in multi processor systems, in: IEEE Transactions on computers, vol 41, 11(Nov. 1992), pp.1430–1441.

    Google Scholar 

  4. Bauch, A., Maehle, E., Self diagnosis, Reconfiguration and Recovery in the Dynamical Reconfigurable Multiprocessor System DAMP, in: Fault-tolerant computing systems: tests, diagnosis, fault-treatment: 5th international GI/ITG/GMA Conference Nürnberg, September 25–27, 1991: Proceedings, Dal Cin, M., and Hohl, W. (Eds.), Springer-Verlag, Berlin, 1991, pp. 18–29.

    Google Scholar 

  5. Bianchini, R., Goodwin, R., Nydick, D.S., Practical application and implementation of distributed system level diagnosis theory, in: Fault-tolerant computing: the twentieth international symposium, IEEE Comp. Soc. Press, Los Alamitos, California, 1990, pp. 332–339.

    Google Scholar 

  6. Chen, Y., Bucken, W., Echtle, K., Efficient algorithms for system diagnosis with both processor and comparator faults, in: IEEE Transactions on parallel and distributed systems, vol 4, 4(Apr. 1993), pp.371–381.

    Google Scholar 

  7. Lee, S., Shin, K.G., Optimal multiple syndrome probabilistic diagnosis,in: Faulttolerant computing: the twentieth international symposium, IEEE Comp. Soc. Press, Los Alamitos, California, 1990, pp. 324–331.

    Google Scholar 

  8. Maheshwari, S.N., Hakimi, S.L., On models for diagnosable systems and probabilistic fault diagnosis, in: IEEE Transaction on computers, vol 25, 3(March 1976).

    Google Scholar 

  9. Kime, C.R., An analysis model for digital system diagnosis, in: IEEE Transactions on computers, vol c-19,11(Nov. 1970).

    Google Scholar 

  10. Jalote, P., Fault tolerance in distributed systems, Prentice Hall, 1994, pp.115–125.

    Google Scholar 

  11. Lee, S., Shin, K.G., On probabilistic diagnosis of multiprocessor systems using multiple syndromes, in: IEEE Transactions on parallel and distributed systems, vol 5, 6(Jun. 1994), pp.630–638.

    Google Scholar 

  12. Lee, S., Shin, K.G., Optimal and efficient probabilistic distributed diagnosis schemes, in: IEEE Transactions on computers, vol 42, 7(Jul. 1993), pp.882–886.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrzej Hlawiczka João Gabriel Silva Luca Simoncini

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Postma, A., Hartman, G., Krol, T. (1996). Removal of all faulty nodes from a fault-tolerant service by means of distributed diagnosis with imperfect fault coverage. In: Hlawiczka, A., Silva, J.G., Simoncini, L. (eds) Dependable Computing — EDCC-2. EDCC 1996. Lecture Notes in Computer Science, vol 1150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61772-8_50

Download citation

  • DOI: https://doi.org/10.1007/3-540-61772-8_50

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61772-3

  • Online ISBN: 978-3-540-70677-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics