The Complexity of Synchronous Iterative Do-All with Crashes

Georgiou, Chryssis; Russell, Alexander; Shvartsman, Alex A.

doi:10.1007/3-540-45414-4_11

Chryssis Georgiou⁵,
Alexander Russell⁵ &
Alex A. Shvartsman^5,6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2180))

Included in the following conference series:

International Symposium on Distributed Computing

605 Accesses
4 Citations

Abstract

Do-All is the problem of performing N tasks in a distributed system of P failure-prone processors [8]. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by the processors are counted. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, Pandf, the number of processor crashes. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts the repeated use of Do-All such as found in algorithm simulations. Our f-sensitive analysis enables us to derive a tight bound for r-iterative Do-All work (that is stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.

This research is supported by the NSF Grant 9988304. The work of the second author is supported in part by the NSF Career Award 0093065. The work of the third author is supported in part by the NSF Career Award 9984774.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aumann, Y., Rabin, M.O.: Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation. 33rd IEEE Symp. on Foundations of Computer Science (1993) 147–156.
Google Scholar
Anderson, R.J., Woll, H.: Algorithms for the Certified Write All Problem. SIAM Journal of Computing, Vol. 26 5 (1997) 1277–1283.
Article MATH MathSciNet Google Scholar
Buss, J., Kanellakis, P.C., Ragde, P., Shvartsman, A.A.: Parallel Algorithms with Processor Failures and Delays. Journal of Algorithms, Vol. 20 (1996) 45–86.
Article MATH MathSciNet Google Scholar
Chlebus, B.S., De Prisco, R., Shvartsman, A.A: Performing Tasks on Restartable Message-Passing Processors. Distributed Computing, Vol. 14 1 (2001) 49–64.
Article Google Scholar
Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A Fault-Tolerant, High Performance Approach. International Conference on Distributed Computer Systems (1995) 467–474.
Google Scholar
De Prisco, R., Mayer, A., Yung, M.: Time-Optimal Message-Efficient Work Performance in the Presence of Faults. 13th ACM Symposium on Principles of Distributed Computing (1994) 161–172.
Google Scholar
Dolev, S., Segala, R., Shvartsman, A.: Dynamic Load Balancing with Group Communication. 6th International Colloquium on Structural Information and Communication Complexity (1999) 111–125.
Google Scholar
Dwork, C., Halpern, J., Waarts, O.: Performing Work Efficiently in the Presence of Faults. SIAM J. on Computing, Vol. 27 5 (1998) 1457–1491.
Article MATH MathSciNet Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of Distributed Consensus with one Faulty Process. Journal of the ACM, Vol. 32 2 (1985) 374–382.
Article MATH MathSciNet Google Scholar
Galil, Z., Mayer, A., Yung, M.: Resolving Message Complexity of Byzantine Agreement and Beyond. 36th IEEE Symp. on Foundations of Comp. Sc. (1995) 724–733.
Google Scholar
Georgiou, C., Shvartsman, A.: Cooperative Computing with Fragmentable and Mergeable Groups. 7th International Colloquium on Structural Information and Communication Complexity (2000) 141–156.
Google Scholar
Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Distributed Cooperation in the Presence of Failures. 4th International Conference on Principles of Distributed Systems (2000) 245–264.
Google Scholar
Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Synchronous Iterative Do-All with Crashes. http://www.engr.uconn.edu/~acr/Papers/faults.ps.
Groote, J.F., Hesselink, W.H., Mauw, S., Vermeulen, R.: An Algorithm for the Asynchronous Write-All Problem Based on Process Collision. Distributed Computing (2001).
Google Scholar
Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Distributed Computing, 2nd Ed., Addison-Wesley and ACM Press (1993).
Google Scholar
Hesselink, W.H., Groote, J.F.: Waitfree Distributed Memory Management by Create, and Read Until Deletion (CRUD). Technical report SEN-R9811, CWI, Amsterdam (1998).
Google Scholar
Kanellakis, P.C., Shvartsman, A.A.: Efficient Parallel Algorithms Can Be Made Robust. Distributed Computing, Vol. 5 (1992) 201–217.
Article MATH Google Scholar
Kanellakis, P.C., Shvartsman, A.A.: Fault-Tolerant Parallel Computation. Kluwer Academic Publishers (1997) ISBN 0-7923-9922-6.
Google Scholar
Kedem, Z.M., Palem, K.V., Raghunathan, A., Spirakis, P.: Combining Tentative and Definite Executions for Dependable Parallel Computing. 23d ACM Symposium on Theory of Computing (1991) 381–390.
Google Scholar
Kedem, Z.M., Palem, K.V., Rabin, M.O., Raghunathan, A.: Efficient Program Transformations for Resilient Parallel Computation via Randomization. 24th ACM Symp. on Theory of Computing (1992) 306–318.
Google Scholar
Kedem, Z.M., Palem, K.V., Spirakis, P.: Efficient Robust Parallel Computations. 22nd ACM Symp. on Theory of Computing (1990) 138–148.
Google Scholar
Lamport, L., Lynch, N.A.: Distributed Computing: Models and Methods. Handbook of Theoretical Computer Science, Vol. 1, North-Holland (1990).
Google Scholar
Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM TOPLAS, Vol. 4 3 (1982) 382–401.
Article MATH Google Scholar
Malewicz, G.G., Russell A., Shvartsman, A.A.: Distributed Cooperation in the Absence of Communication. 14th International Symposium on Distributed Computing (2000) 119–133.
Google Scholar
Martel, C., Subramonian, R.: On the Complexity of CertifiedWrite-All Algorithms. Journal of Algorithms, Vol. 16 3 (1994) 361–387.
Article MATH MathSciNet Google Scholar
Martel, C., Park, A., Subramonian, R.: Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers. SIAM Journal on Computing, Vol. 21 (1992) 1070–1099.
Article MATH MathSciNet Google Scholar
Martel, C., Subramonian, R., Park, A.: Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs. 32d IEEE Symp. on Foundations of Computer Science (1990) 590–599.
Google Scholar
Pease, M., Shostak, R., Lamport, L.: Reaching Agreement in the Presence of Faults. Journal of the ACM, Vol. 27 2 (1980) 228–234.
Article MATH MathSciNet Google Scholar
Shvartsman, A.A.: Achieving Optimal CRCW PRAM Fault-Tolerance. Information Processing Letters, Vol. 39 2 (1991) 59–66.
Article MATH MathSciNet Google Scholar
Schlichting, R.D, Schneider, F.B.: Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems. TOCS 1, Vol. 3 (1983) 222–238.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA
Chryssis Georgiou, Alexander Russell & Alex A. Shvartsman
Laboratory of Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Alex A. Shvartsman

Authors

Chryssis Georgiou
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Russell
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Shvartsman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas A&M University, College Station, TX, 77843-3112, USA
Jennifer Welch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Georgiou, C., Russell, A., Shvartsman, A.A. (2001). The Complexity of Synchronous Iterative Do-All with Crashes. In: Welch, J. (eds) Distributed Computing. DISC 2001. Lecture Notes in Computer Science, vol 2180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45414-4_11

Download citation

DOI: https://doi.org/10.1007/3-540-45414-4_11
Published: 11 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42605-9
Online ISBN: 978-3-540-45414-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics