Skip to main content

The Complexity of Synchronous Iterative Do-All with Crashes

  • Conference paper
  • First Online:
Distributed Computing (DISC 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2180))

Included in the following conference series:

Abstract

Do-All is the problem of performing N tasks in a distributed system of P failure-prone processors [8]. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by the processors are counted. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, Pandf, the number of processor crashes. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts the repeated use of Do-All such as found in algorithm simulations. Our f-sensitive analysis enables us to derive a tight bound for r-iterative Do-All work (that is stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.

This research is supported by the NSF Grant 9988304. The work of the second author is supported in part by the NSF Career Award 0093065. The work of the third author is supported in part by the NSF Career Award 9984774.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aumann, Y., Rabin, M.O.: Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation. 33rd IEEE Symp. on Foundations of Computer Science (1993) 147–156.

    Google Scholar 

  2. Anderson, R.J., Woll, H.: Algorithms for the Certified Write All Problem. SIAM Journal of Computing, Vol. 26 5 (1997) 1277–1283.

    Article  MATH  MathSciNet  Google Scholar 

  3. Buss, J., Kanellakis, P.C., Ragde, P., Shvartsman, A.A.: Parallel Algorithms with Processor Failures and Delays. Journal of Algorithms, Vol. 20 (1996) 45–86.

    Article  MATH  MathSciNet  Google Scholar 

  4. Chlebus, B.S., De Prisco, R., Shvartsman, A.A: Performing Tasks on Restartable Message-Passing Processors. Distributed Computing, Vol. 14 1 (2001) 49–64.

    Article  Google Scholar 

  5. Dasgupta, P., Kedem, Z., Rabin, M.: Parallel Processing on Networks of Workstation: A Fault-Tolerant, High Performance Approach. International Conference on Distributed Computer Systems (1995) 467–474.

    Google Scholar 

  6. De Prisco, R., Mayer, A., Yung, M.: Time-Optimal Message-Efficient Work Performance in the Presence of Faults. 13th ACM Symposium on Principles of Distributed Computing (1994) 161–172.

    Google Scholar 

  7. Dolev, S., Segala, R., Shvartsman, A.: Dynamic Load Balancing with Group Communication. 6th International Colloquium on Structural Information and Communication Complexity (1999) 111–125.

    Google Scholar 

  8. Dwork, C., Halpern, J., Waarts, O.: Performing Work Efficiently in the Presence of Faults. SIAM J. on Computing, Vol. 27 5 (1998) 1457–1491.

    Article  MATH  MathSciNet  Google Scholar 

  9. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of Distributed Consensus with one Faulty Process. Journal of the ACM, Vol. 32 2 (1985) 374–382.

    Article  MATH  MathSciNet  Google Scholar 

  10. Galil, Z., Mayer, A., Yung, M.: Resolving Message Complexity of Byzantine Agreement and Beyond. 36th IEEE Symp. on Foundations of Comp. Sc. (1995) 724–733.

    Google Scholar 

  11. Georgiou, C., Shvartsman, A.: Cooperative Computing with Fragmentable and Mergeable Groups. 7th International Colloquium on Structural Information and Communication Complexity (2000) 141–156.

    Google Scholar 

  12. Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Distributed Cooperation in the Presence of Failures. 4th International Conference on Principles of Distributed Systems (2000) 245–264.

    Google Scholar 

  13. Georgiou, C., Russell, A., Shvartsman, A.: The Complexity of Synchronous Iterative Do-All with Crashes. http://www.engr.uconn.edu/~acr/Papers/faults.ps.

  14. Groote, J.F., Hesselink, W.H., Mauw, S., Vermeulen, R.: An Algorithm for the Asynchronous Write-All Problem Based on Process Collision. Distributed Computing (2001).

    Google Scholar 

  15. Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Distributed Computing, 2nd Ed., Addison-Wesley and ACM Press (1993).

    Google Scholar 

  16. Hesselink, W.H., Groote, J.F.: Waitfree Distributed Memory Management by Create, and Read Until Deletion (CRUD). Technical report SEN-R9811, CWI, Amsterdam (1998).

    Google Scholar 

  17. Kanellakis, P.C., Shvartsman, A.A.: Efficient Parallel Algorithms Can Be Made Robust. Distributed Computing, Vol. 5 (1992) 201–217.

    Article  MATH  Google Scholar 

  18. Kanellakis, P.C., Shvartsman, A.A.: Fault-Tolerant Parallel Computation. Kluwer Academic Publishers (1997) ISBN 0-7923-9922-6.

    Google Scholar 

  19. Kedem, Z.M., Palem, K.V., Raghunathan, A., Spirakis, P.: Combining Tentative and Definite Executions for Dependable Parallel Computing. 23d ACM Symposium on Theory of Computing (1991) 381–390.

    Google Scholar 

  20. Kedem, Z.M., Palem, K.V., Rabin, M.O., Raghunathan, A.: Efficient Program Transformations for Resilient Parallel Computation via Randomization. 24th ACM Symp. on Theory of Computing (1992) 306–318.

    Google Scholar 

  21. Kedem, Z.M., Palem, K.V., Spirakis, P.: Efficient Robust Parallel Computations. 22nd ACM Symp. on Theory of Computing (1990) 138–148.

    Google Scholar 

  22. Lamport, L., Lynch, N.A.: Distributed Computing: Models and Methods. Handbook of Theoretical Computer Science, Vol. 1, North-Holland (1990).

    Google Scholar 

  23. Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM TOPLAS, Vol. 4 3 (1982) 382–401.

    Article  MATH  Google Scholar 

  24. Malewicz, G.G., Russell A., Shvartsman, A.A.: Distributed Cooperation in the Absence of Communication. 14th International Symposium on Distributed Computing (2000) 119–133.

    Google Scholar 

  25. Martel, C., Subramonian, R.: On the Complexity of CertifiedWrite-All Algorithms. Journal of Algorithms, Vol. 16 3 (1994) 361–387.

    Article  MATH  MathSciNet  Google Scholar 

  26. Martel, C., Park, A., Subramonian, R.: Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers. SIAM Journal on Computing, Vol. 21 (1992) 1070–1099.

    Article  MATH  MathSciNet  Google Scholar 

  27. Martel, C., Subramonian, R., Park, A.: Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs. 32d IEEE Symp. on Foundations of Computer Science (1990) 590–599.

    Google Scholar 

  28. Pease, M., Shostak, R., Lamport, L.: Reaching Agreement in the Presence of Faults. Journal of the ACM, Vol. 27 2 (1980) 228–234.

    Article  MATH  MathSciNet  Google Scholar 

  29. Shvartsman, A.A.: Achieving Optimal CRCW PRAM Fault-Tolerance. Information Processing Letters, Vol. 39 2 (1991) 59–66.

    Article  MATH  MathSciNet  Google Scholar 

  30. Schlichting, R.D, Schneider, F.B.: Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems. TOCS 1, Vol. 3 (1983) 222–238.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Georgiou, C., Russell, A., Shvartsman, A.A. (2001). The Complexity of Synchronous Iterative Do-All with Crashes. In: Welch, J. (eds) Distributed Computing. DISC 2001. Lecture Notes in Computer Science, vol 2180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45414-4_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45414-4_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42605-9

  • Online ISBN: 978-3-540-45414-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics