Fail-Silent Hardware for Distributed Systems
distributed computations are assumed to be structured as software components communicating via messages;
XPA: software components execute on fail-controlled nodes with the fail-silent property: a node either functions according to the specification or stops functioning;
OSA: the fail-silent property is not essential for nodes, so software components can execute on ordinary (potentially) fail-uncontrolled nodes; however, all the protocols for message passing are executed on fail-silent hardware (the Network Attachment Controllers, NACs);
nodes communicate with each other through redundant communication networks;
software components can be replicated on distinct nodes for increased reliability; the degree of replication (if any) for a software component will be determined by the failure characteristic of the underlying nodes: K+1 replicas can tolerate up to K replica failures if the nodes are assumed to be fail-silent, whilst 3K+1 replicas are needed if the nodes are assumed to be fail-uncontrolled.
Unable to display preview. Download preview PDF.