Distributed Fault-Tolerance

  • David Powell
Part of the Research Reports ESPRIT book series (ESPRIT, volume 1)


Distribution and fault-tolerance are tightly related. Should a single element of a distributed system fail, users expect at worst a slight degradation of the service that is offered; distributed systems must thus at least have some built-in fault-tolerance. On the other hand, most fault-tolerant systems can, at some level or another, be seen as a distributed system due to their redundant processing resources. Distributed fault-tolerance is used here to refer to that class of techniques suitable for ensuring fault-tolerance in an architecture consisting of a set of processing elements (called nodes or stations) interconnected by a message-passing communication network (figure 1). The distributed fault-tolerance techniques discussed here are focussed towards distributed systems in which the communication network consists of one or more local area networks. In particular, the existence of high-bandwidth broadcast channels allowing efficient multicast communication is assumed.


Software Component Replication Technique Data Message Faulty Node Input Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© ECSC — EEC — EAEC, Brussels — Luxembourg 1991

Authors and Affiliations

  • David Powell
    • 1
  1. 1.LAAS-CNRSToulouseFrance

Personalised recommendations