Introduction

  • Paris Christos Kanellakis
  • Alex Allister Shvartsman
Chapter
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 401)

Abstract

THIS study of fault-tolerant parallel computation uses models of computation based on the parallel random access machine, or pram. The pram model is generally accepted as a convenient abstraction useful for defining and analyzing parallel algorithms. However it makes some assumptions that call into question its practicality. The main such assumptions are global synchronization of processors, high-bandwidth concurrent access to shared memory, and infallibility of processors, interconnections and memory. In this monograph we pursue the goal of preserving the high-level pram abstraction that makes it attractive for programming parallel algorithms, while narrowing the gap between prams and realizable parallel machines. Our primary focus is the removal of the assumption that the processors are failure-free. In some settings we also show how to relax the assumption of global synchrony and how to limit shared memory access concurrency in fault-tolerant algorithms while preserving their efficiency.

Keywords

Parallel Algorithm Shared Memory Systolic Array Parallel Time Triple Modular Redundancy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliographic Notes

  1. [60]
    R.M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.Google Scholar
  2. [101]
    N. Pippenger, “Communications Networks,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.Google Scholar
  3. [116]
    L. Valiant, “General Purpose Parallel Architectures,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.Google Scholar
  4. [74]
    F. Thomson Leighton, Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes, Morgan Kaufman Publishers, San Mateo, CA, 1992.Google Scholar
  5. [42]
    S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.Google Scholar
  6. [30]
    T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, 1990.Google Scholar
  7. [70]
    H.T. Kung and C.E. Leiserson, “Algorithms for VLSI Processor Arrays”, presented at the Symp. on Sparse Matrix Computations and Their Applications, Knoxville, TN, 1978.Google Scholar
  8. [89]
    C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, MA, 1980.Google Scholar
  9. [94]
    S. Owicki and D. Gries, “An Axiomatic Proof Technique for Parallel Programs I”, Acta Informatica, vol. 6, pp. 319–340, 1976.MathSciNetMATHCrossRefGoogle Scholar
  10. [41]
    M.J. Flynn, “Very High Speed Computing Systems”, in Proc. of IEEE, vol. 54, no. 12, pp. 1901–1909, 1966.Google Scholar
  11. [64]
    D.E. Knuth, The Art of Computer Programming, vol. 3, Sorting and Searching, Addison-Wesley Publ. Co., 1973.Google Scholar
  12. [106]
    L. Rudolph, “A Robust Sorting Network”, IEEE Trans. on Computers, vol. 34, no. 4, pp. 326–335, 1985.MathSciNetMATHCrossRefGoogle Scholar
  13. [31]
    F. Cristian, “Understanding Fault-Tolerant Distributed Systems”, in Communications of the ACM, vol. 3, no. 2, pp. 56–78, 1991.Google Scholar
  14. [80]
    N.A. Lynch, Distributed Algorithms, Morgan Kaufman Publishers, San Mateo, CA, 1995.Google Scholar
  15. [97]
    M. Pease, R. Shostak, L. Lamport, “Reaching agreement in the presence of faults”, JACM, vol. 27, no. 2, pp. 228–234, 1980.MathSciNetMATHCrossRefGoogle Scholar
  16. [73]
    L. Lamport, R. Shostak and M. Pease, “The Byzantine Generals Problem”, ACM TOPLAS, vol. 4, no. 3, pp. 382–401, 1982.MATHCrossRefGoogle Scholar
  17. [40]
    M.J. Fischer, N. A. Lynch, M. S. Paterson, “Impossibility of distributed consensus with one faulty process”, JACM, vol. 32, no. 2, pp. 374–382, 1985.MathSciNetMATHCrossRefGoogle Scholar
  18. [81]
    N.A. Lynch, “One Hundred Impossibility Proofs for Distributed Comuting”, Proc. of the 8th ACM Symposium on Principles of Distributed Computing, pp. 1–27, 1989.CrossRefGoogle Scholar
  19. [72]
    L. Lamport and N.A. Lynch, “Distributed Computing: Models and Methods,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.Google Scholar
  20. [108]
    R.D. Schlichting and F.B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1, no. 3, pp. 222–238, 1983.CrossRefGoogle Scholar
  21. [87]
    C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMS are (Almost) as Good as Synchronous PRAMS,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.Google Scholar
  22. [63]
    Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.Google Scholar
  23. [56]
    P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5, no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.Google Scholar
  24. [62]
    Z.M. Kedem, K.V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.Google Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Paris Christos Kanellakis
    • 1
  • Alex Allister Shvartsman
    • 2
  1. 1.Brown UniversityProvidenceUSA
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations