The ability to cooperatively perform a collection of tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search, such as SETI@home, to distributed simulation, and multi-agent collaboration. Target distributed platforms for such applications consist of hundreds or even thousands of processing units, and encompass multiprocessor machines, clusters of workstations, wide-area networks, and network supercomputers, all in wide use today. The benefits of solving cooper- ation problems consisting of large numbers of tasks on multiple processors can only be realized if one is able to effectively marshal the available computing re- sources in order to achieve substantial speed-up relative to the time necessary to solve the problem using a single fast computer or a few of such computers in a tightly-coupled multiprocessor. In order to achieve high efficiency in using distributed computing platforms comprised of large numbers of processors it is necessary to eliminate redundant computation done by the processors. This is challenging because the availability of distributed computing resources may fluctuate due to failures and asynchrony of the involved processors, and due to delays and connectivity failures in the underlying network. Such pertur- bations in the computing medium may degrade the efficiency of algorithms designed to solve computational problems on these systems, and even cause the algorithms to produce incorrect results. Thus a system containing unre- liable and asynchronous components must dedicate resources both to solving the computational problem, and to coordinating the fluctuating resources in the presence of adversity.
KeywordsComplexity Measure Multiple Processor Partitionable Network Airborne Radar Faulty Processor
Unable to display preview. Download preview PDF.