Synchronous Do-All with Crashes and Restarts
In general-purpose distributed computation in dynamic environments, it is important to be able to deal, and efficiently so, with processors failing, then restarting and rejoining the system. Here we consider the Do-All problem of performing n tasks in a message- passing distributed environment consisting of p processors that are subject to failures and restarts. Failures are crashes, i.e., a crashed processor stops and does not perform any further actions until, and if, it restarts. Restarted pro- cessors resume computation in a predefined initial state, and no stable storage is assumed. The distributed environment is synchronous and the underlying network is fully connected, so that any processor can send a message to any other processor. Messages are not lost in transit or corrupted. Because the system is synchronous we also assume that there is a known upper bound on message delivery time. It is convenient to assume that messages sent within one step of a certain known duration are delivered before the end of the next such step. The efficiency of algorithms is evaluated in terms of total-work and message complexities.
KeywordsMessage Complexity Local View Operational Processor Local Step Report Message
Unable to display preview. Download preview PDF.