Advertisement

Fault-Tolerance and Efficiency in Massively Parallel Algorithms

  • Paris C. Kanellakis
  • Alex A. Shvartsman
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 284)

Abstract

We present an overview of massively parallel deterministic algorithms which combine high fault-tolerance and efficiency. This desirable combination (called robustness here) is nontrivial, since increasing efficiency implies removing redundancy whereas increasing fault-tolerance requires adding redundancy to computations. We study a spectrum of algorithmic models for which significant robustness is achievable, from static fault, synchronous computation to dynamic fault, asynchronous computation. In addition to fail-stop processor models, we examine and deal with arbitrarily initialized memory and restricted memory access concurrency. We survey the deterministic upper bounds for the basic Write-All primitive, the lower bounds on its efficiency, and we identify some of the key open questions. We also generalize the robust computing of functions to relations; this new approach can model approximate computations. We show how to compute approximate Write-All optimally. Finally, we synthesize the state-of-the-art in a complexity classification, which extends with fault-tolerance the traditional classification of efficient parallel algorithms.

Keywords

Parallel Algorithm Shared Memory Random Access Machine Overhead Ratio Faulty Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. [1]
    M. Ajtai, J. Aspnes, C. Dwork, O. Waarts, “The Competitive Analysis of Wait-Free Algorithms and its Application to the Cooperative Collect Problem”, manuscript 1993.Google Scholar
  2. [2]
    G. B. Adams III, D. P. Agrawal, H. J. Seigel, “A Survey and Comparison of Fault-tolerant Multistage Interconnection Networks”, IEEE Computer, 20,6, pp. 14–29, 1987.Google Scholar
  3. [3]
    R. Anderson, H. Woll, “Wait-Free Parallel Algorithms for the Union-Find Problem”, Proc. of the 23rd ACM Symp. on Theory of Computing, pp. 370–380, 1991.Google Scholar
  4. [4]
    Y. Aumann and M.O. Rabin, “Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation”, in Proc. of the 33rd IEEE Symposium on Foundations of Computer Science, pp. 147–156, 1992.Google Scholar
  5. [5]
    Y. Aumann, Z.M. Kedem, K.V. Palem, M.O. Rabin, “Highly Efficient Asynchronous Execution of Large-Grained Parallel Programs”, in Proc. of the 34th IEEE Symposium on Foundations of Computer Science, pp. 271–280, 1993.Google Scholar
  6. [6]
    P. Beame and J. Hastad, “Optimal bounds for decision problems on the CRCW PRAM,” Journal of the ACM, vol. 36,no. 3, pp. 643–670, 1989.MATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    P. Beame, M. Kik and M. Kutylowski, “Information broadcasting by Exclusive Read PRAMs”, manuscript 1992.Google Scholar
  8. [8]
    J. Buss, P.C. Kanellakis, P. Ragde, A.A. Shvartsman, “Parallel algorithms with processor failures and delays”, Brown Univ. TR CS-91-54, August 1991.Google Scholar
  9. [9]
    R. Cole and O. Zajicek, “the APRAM: Incorporating Asynchrony into the PRAM Model,” in Proc. of the 1989 ACM Symp. on Parallel Algorithms and Architectures, pp. 170–178, 1989.Google Scholar
  10. [10]
    R. Cole and O. Zajicek, “the Expected Advantage of Asynchrony,” in Proc. 2nd ACM Symp. on Parallel Algorithms and Architectures, pp. 85–94, 1990.Google Scholar
  11. [11]
    R. DePrisco, A. Mayer, M. Young, “Time-Optimal Message-Optimal Work performance in the Presence of Faults” manuscript, 1994.Google Scholar
  12. [12]
    C. Dwork, J. Halpern, O. Waarts, “Accomplishing Work in the Presence of Failures” in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.Google Scholar
  13. [13]
    D. Eppstein and Z. Galil, “Parallel Techniques for Combinatorial Computation”, Annual Computer Science Review, 3 (1988), pp. 233–83.CrossRefMathSciNetGoogle Scholar
  14. [14]
    S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.Google Scholar
  15. [15]
    P. Gibbons, “A More Practical PRAM Model,” in Proc. of the 1989 ACM Symposium on Parallel Algorithms and Architectures, pp. 158–168, 1989.Google Scholar
  16. [16]
    P. C. Kanellakis, D. Michailidis, A. A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, 7th Int-l Workshop on Distributed Algorithms, pp. 99–114, 1993.Google Scholar
  17. [17]
    P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5,no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.MATHCrossRefGoogle Scholar
  18. [18]
    P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms On Restartable Fail-Stop Processors”, in Proc. of the 10th ACM Symposium on Principles of Distributed Computing, 1991.Google Scholar
  19. [19]
    P. C. Kanellakis and A. A. Shvartsman, “Robust Computing with Fail-Stop Processors”, in Proc. of the Second Annual Review and Workshop on Ultradependable Multicomputers, Office of Naval Research, pp. 55–60, 1991.Google Scholar
  20. [20]
    R. M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.Google Scholar
  21. [21]
    Z. M. Kedem, K. V. Palem, M. O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.Google Scholar
  22. [22]
    Z. M. Kedem, K. V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.Google Scholar
  23. [23]
    Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.Google Scholar
  24. [24]
    C. P. Kruskal, L. Rudolph, M. Snir, “Efficient Synchronization on Multiprocessors with Shared Memory,” in ACM Trans. on Programming Languages and Systems, vol. 10,no. 4, pp. 579–601 1988.MATHCrossRefGoogle Scholar
  25. [25]
    C. P. Kruskal, L. Rudolph, M. Snir, “A Complexity Theory of Efficient Parallel Algorithms,” Theoretical Computer Science 71, pp. 95–132, 1990.MATHCrossRefMathSciNetGoogle Scholar
  26. [26]
    L. E. Ladner, M. J. Fischer, “Parallel Prefix Computation”, Journal of the ACM, vol. 27,no. 4, pp. 831–838, 1980.MATHCrossRefMathSciNetGoogle Scholar
  27. [27]
    M. Li and Y. Yesha, “New Lower Bounds for Parallel Computation,” Journal of the ACM, vol. 36,no. 3, pp. 671–680, 1989.MATHCrossRefMathSciNetGoogle Scholar
  28. [28]
    A. López-Ortiz, “Algorithm X takes work ω(n log2 n/log log n) in a synchronous fail-stop (no restart) PRAM”, unpublished manuscript, 1992.Google Scholar
  29. [29]
    C. Martel, personal communication, March, 1991.Google Scholar
  30. [30]
    C. Martel, A. Park, and R. Subramonian, “Work-optimal Asynchronous Algorithms for Shared Memory Parallel Computers,” SIAM Journal on Computing, vol. 21, pp. 1070–1099, 1992MATHCrossRefMathSciNetGoogle Scholar
  31. [31]
    C. Martel and R. Subramonian, “On the Complexity of Certified Write-All Algorithms”, to appear in Journal of Algorithms (a prel. version in the Proc. of the 12th Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, India, December 1992).Google Scholar
  32. [32]
    C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.Google Scholar
  33. [33]
    J. Naor, R.M. Roth, “Constructions of Permutation Arrays for Ceratin Scheduling Cost Measures”, manuscript, 1993.Google Scholar
  34. [34]
    N. Nishimura, “Asynchronous Shared Memory Parallel Computation,” in Proc. 3rd ACM Symp. on Parallel Algor. and Architect., pp. 76–84, 1990.Google Scholar
  35. [35]
    N. Pippinger, “On Simultaneous Resource Bounds”, in Proc. of 20th IEEE Symposium on Foundations of Computer Science, pp. 307–311, 1979.Google Scholar
  36. [36]
    M.O. Rabin, “Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance”, J. of ACM, vol. 36,no. 2, pp. 335–348, 1989.MATHCrossRefMathSciNetGoogle Scholar
  37. [37]
    D. B. Sarrazin and M. Malek, “Fault-Tolerant Semiconductor Memories”, IEEE Computer, vol. 17,no. 8, pp. 49–56, 1984.Google Scholar
  38. [38]
    R. D. Schlichting and F. B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1,no. 3, pp. 222–238, 1983.CrossRefGoogle Scholar
  39. [39]
    J. T. Schwartz, “Ultracomputers”, ACM Transactions on Programming Languages and Systems, vol. 2,no. 4, pp. 484–521, 1980.MATHCrossRefGoogle Scholar
  40. [40]
    A. A. Shvartsman, “Achieving Optimal CRCW PRAM Fault-Tolerance”, Information Processing Letters, vol. 39,no. 2, pp. 59–66, 1991.MATHCrossRefMathSciNetGoogle Scholar
  41. [41]
    A. A. Shvartsman, Fault-Tolerant and Efficient Parallel Computation, Ph.D. dissertation, Brown University, Tech. Rep. CS-92-23, 1992.Google Scholar
  42. [42]
    A. A. Shvartsman, “Efficient Write-All Algorithm for Fail-Stop PRAM Without Initialized Memory”, Information Processing Letters, vol. 44,no. 6, pp. 223–231, 1992.MATHCrossRefGoogle Scholar
  43. [43]
    R.E. Tarjan, U. Vishkin, “Finding biconnected components and computing tree functions in logarithmic parallel time”, in Proc. of the 25th IEEE FOCS, pp. 12–22, 1984.Google Scholar
  44. [44]
    J. S. Vitter, R. A. Simmons, “New Classes for Parallel Complexity: A Study of Unification and Other Complete Problems for P,” IEEE Trans. Comput., vol. 35,no. 5, 1986.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Paris C. Kanellakis
    • 1
  • Alex A. Shvartsman
    • 2
  1. 1.Computer Science DepartmentBrown UniversityProvidenceUSA
  2. 2.Digital Equipment CorporationDigital Consulting Technology OfficeLittletonUSA

Personalised recommendations