Fault Tolerance in Distributed Computing
A distributed system consists of several independent processing components that interact with each other via an interconnecting communication link network consisting of communication components. Distributed computing refers to the algorithmic controlling of the distributed system’s processing components by means of a distributed program in order to reach a collective goal, that is, to provide a certain service. Unfortunately, the components of literally every system are naturally imperfect and therefore prone to failures that may render the system unable to provide the service. In order to be able to tolerate the failure of some components, that is, to keep the service available despite these failures, the system must be equipped with redundancy in space and time. The former refers to redundant components that take over the part played by failed components. The latter refers to the additional overhead required to manage these components. Fault-tolerant distributed computing refers to the algorithmic controlling of the distributed system’s components to provide the desired service despite the presence of certain failures in the system by exploiting redundancy in space and time.
KeywordsFault Tolerance Read Operation Triangular Grid Failure Scenario Sequential Consistency
Unable to display preview. Download preview PDF.