Abstract
Do-All is the problem of performing N tasks in a distributed system of P failure-prone processors [8]. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by the processors are counted. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, Pandf, the number of processor crashes. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts the repeated use of Do-All such as found in algorithm simulations. Our f-sensitive analysis enables us to derive a tight bound for r-iterative Do-All work (that is stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.
This research is supported by the NSF Grant 9988304. The work of the second author is supported in part by the NSF Career Award 0093065. The work of the third author is supported in part by the NSF Career Award 9984774.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aumann, Y., Rabin, M.O.: Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation. 33rd IEEE Symp. on Foundations of Computer Science (1993) 147–156.
Anderson, R.J., Woll, H.: Algorithms for the Certified Write All Problem. SIAM Journal of Computing, Vol. 26 5 (1997) 1277–1283.
Buss, J., Kanellakis, P.C., Ragde, P., Shvartsman, A.A.: Parallel Algorithms with Processor Failures and Delays. Journal of Algorithms, Vol. 20 (1996) 45–86.
Chlebus, B.S., De Prisco, R., Shvartsman, A.A: Performing Tasks on Restartable Message-Passing Processors. Distributed Computing, Vol. 14 1 (2001) 49–64.
Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A Fault-Tolerant, High Performance Approach. International Conference on Distributed Computer Systems (1995) 467–474.
De Prisco, R., Mayer, A., Yung, M.: Time-Optimal Message-Efficient Work Performance in the Presence of Faults. 13th ACM Symposium on Principles of Distributed Computing (1994) 161–172.
Dolev, S., Segala, R., Shvartsman, A.: Dynamic Load Balancing with Group Communication. 6th International Colloquium on Structural Information and Communication Complexity (1999) 111–125.
Dwork, C., Halpern, J., Waarts, O.: Performing Work Efficiently in the Presence of Faults. SIAM J. on Computing, Vol. 27 5 (1998) 1457–1491.
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of Distributed Consensus with one Faulty Process. Journal of the ACM, Vol. 32 2 (1985) 374–382.
Galil, Z., Mayer, A., Yung, M.: Resolving Message Complexity of Byzantine Agreement and Beyond. 36th IEEE Symp. on Foundations of Comp. Sc. (1995) 724–733.
Georgiou, C., Shvartsman, A.: Cooperative Computing with Fragmentable and Mergeable Groups. 7th International Colloquium on Structural Information and Communication Complexity (2000) 141–156.
Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Distributed Cooperation in the Presence of Failures. 4th International Conference on Principles of Distributed Systems (2000) 245–264.
Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Synchronous Iterative Do-All with Crashes. http://www.engr.uconn.edu/~acr/Papers/faults.ps.
Groote, J.F., Hesselink, W.H., Mauw, S., Vermeulen, R.: An Algorithm for the Asynchronous Write-All Problem Based on Process Collision. Distributed Computing (2001).
Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Distributed Computing, 2nd Ed., Addison-Wesley and ACM Press (1993).
Hesselink, W.H., Groote, J.F.: Waitfree Distributed Memory Management by Create, and Read Until Deletion (CRUD). Technical report SEN-R9811, CWI, Amsterdam (1998).
Kanellakis, P.C., Shvartsman, A.A.: Efficient Parallel Algorithms Can Be Made Robust. Distributed Computing, Vol. 5 (1992) 201–217.
Kanellakis, P.C., Shvartsman, A.A.: Fault-Tolerant Parallel Computation. Kluwer Academic Publishers (1997) ISBN 0-7923-9922-6.
Kedem, Z.M., Palem, K.V., Raghunathan, A., Spirakis, P.: Combining Tentative and Definite Executions for Dependable Parallel Computing. 23d ACM Symposium on Theory of Computing (1991) 381–390.
Kedem, Z.M., Palem, K.V., Rabin, M.O., Raghunathan, A.: Efficient Program Transformations for Resilient Parallel Computation via Randomization. 24th ACM Symp. on Theory of Computing (1992) 306–318.
Kedem, Z.M., Palem, K.V., Spirakis, P.: Efficient Robust Parallel Computations. 22nd ACM Symp. on Theory of Computing (1990) 138–148.
Lamport, L., Lynch, N.A.: Distributed Computing: Models and Methods. Handbook of Theoretical Computer Science, Vol. 1, North-Holland (1990).
Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM TOPLAS, Vol. 4 3 (1982) 382–401.
Malewicz, G.G., Russell A., Shvartsman, A.A.: Distributed Cooperation in the Absence of Communication. 14th International Symposium on Distributed Computing (2000) 119–133.
Martel, C., Subramonian, R.: On the Complexity of CertifiedWrite-All Algorithms. Journal of Algorithms, Vol. 16 3 (1994) 361–387.
Martel, C., Park, A., Subramonian, R.: Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers. SIAM Journal on Computing, Vol. 21 (1992) 1070–1099.
Martel, C., Subramonian, R., Park, A.: Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs. 32d IEEE Symp. on Foundations of Computer Science (1990) 590–599.
Pease, M., Shostak, R., Lamport, L.: Reaching Agreement in the Presence of Faults. Journal of the ACM, Vol. 27 2 (1980) 228–234.
Shvartsman, A.A.: Achieving Optimal CRCW PRAM Fault-Tolerance. Information Processing Letters, Vol. 39 2 (1991) 59–66.
Schlichting, R.D, Schneider, F.B.: Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems. TOCS 1, Vol. 3 (1983) 222–238.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Georgiou, C., Russell, A., Shvartsman, A.A. (2001). The Complexity of Synchronous Iterative Do-All with Crashes. In: Welch, J. (eds) Distributed Computing. DISC 2001. Lecture Notes in Computer Science, vol 2180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45414-4_11
Download citation
DOI: https://doi.org/10.1007/3-540-45414-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42605-9
Online ISBN: 978-3-540-45414-4
eBook Packages: Springer Book Archive