Advertisement

Language Support for Fault-Tolerant Parallel and Distributed Programming

  • Richard D. Schlichting
  • David E. Bakken
  • Vicraj T. Thomas
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 284)

Abstract

Most high-level programming languages contain little support for programming multicomputer programs that must continue to execute despite failures in the underlying computing platform. This paper describes two projects that address this problem by providing features specifically designed for fault-tolerance. The first is FT-Linda, a version of the Linda coordination language for writing fault-tolerant parallel programs. Major enhancements include stable tuple spaces whose contents survive failure and atomic execution of collections of tuple space operations. The second is FT-SR, a language based on the existing SR distributed programming language. Major features include support for transparent module replication, ordered group communication, automatic recovery and failure notification. Prototype versions of both languages have been implemented.

Keywords

Virtual Machine Stable Storage Runtime System Resource Group Tuple Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

1.3.7. References

  1. [1]
    S. Mishra and R. Schlichting, “Abstractions for constructing dependable distributed systems,” Technical report 92-19, Dept. of Computer Science, University of Arizona, 1992.Google Scholar
  2. [2]
    S. Ahuja, N. Carriero, and D. Gelernter, “Linda and friends,” IEEE Computer, vol. 19, pp. 26–34, August 1986.Google Scholar
  3. [3]
    G. Andrews and R. Olsson, The SR Programming Language, Benjamin/Cummings, Redwood City, CA, 1993.MATHGoogle Scholar
  4. [4]
    J. Gray, “An approach to decentralized computer systems,” IEEE Trans. on Software Engineering, vol. SE-12, pp. 684–692, June 1986.Google Scholar
  5. [5]
    B. Lampson, “Atomic transactions,” in Distributed Systems-Architecture and Implementation (B. Lampson, M. Paul, and H. Seigert, eds.), ch. 11, pp. 246–265, Springer-Verlag, Berlin, 1981.Google Scholar
  6. [6]
    F. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Computing Surveys, vol. 22, pp. 299–319, Dec. 1990.CrossRefGoogle Scholar
  7. [7]
    S. Mishra, L. Peterson, and R. Schlichting, “Consul: A communication substrate for fault-tolerant distributed programs,” Distributed Systems Engineering, vol. 1, pp. 87–103, 1993.CrossRefGoogle Scholar
  8. [8]
    N. Hutchinson and L. Peterson, “The x-kernel: An architecture for implementing network protocols,” IEEE Trans. on Software Engineering, vol. SE-17, pp. 64–76, Jan. 1991.CrossRefGoogle Scholar
  9. [9]
    D. Gelernter, “Generative communication in Linda,” ACM Trans. on Programming Languages and Systems, vol. 7, pp. 80–112, Jan. 1985.MATHCrossRefGoogle Scholar
  10. [10]
    D. Bakken and R. Schlichting, “Supporting fault-tolerant parallel programming in Linda,” IEEE Trans. on Parallel and Distributed Systems, to appear, 1994.Google Scholar
  11. [11]
    B. Anderson and D. Shasha, “Persistent Linda: Linda + transactions + query processing,” in Research Directions in High-Level Parallel Programming Languages, LNCS, Vol. 574, pp. 93–109, Springer-Verlag, Berlin, 1991.Google Scholar
  12. [12]
    V. Thomas, FT-SR: A Programming Language for Constructing Fault-Tolerant Distributed Systems, Ph.D. Dissertation, Dept. of Computer Science, University of Arizona, 1993.Google Scholar
  13. [13]
    P. Buhr, H. MacDonald, and C. Zarnke, “Synchronous and asynchronous handling of abnormal events in the μSystem,” Software—Practice and Experience, vol. 22, pp. 735–776, Sept. 1992.CrossRefGoogle Scholar
  14. [14]
    R. Cmelik, N. Gehani, and W. Roome, “Fault Tolerant Concurrent C: A tool for writing fault tolerant distributed programs,” in Proc. 18th Symp. on Fault-Tolerant Computing, Tokyo, pp. 55–61, June 1988.Google Scholar
  15. [15]
    R. Schlichting, F. Cristian, and T. Purdin, “A linguistic approach to failure-handling in distributed systems,” in Dependable Computing for Critical Applications, pp. 387–409, Springer-Verlag, Wien, 1991.Google Scholar
  16. [16]
    P. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, Reading, MA, 1987.Google Scholar
  17. [17]
    J. Leichter, Shared Tuple Memories, Shared Memories, Buses and LANs—Linda Implementation Across the Spectrum of Connectivity, Ph.D. Dissertation, Dept. of Computer Science, Yale University, 1989.Google Scholar
  18. [18]
    J. Chang and N. Maxemchuk, “Reliable broadcast protocols,” ACM Trans. on Computer Systems, vol. 2, pp. 251–273, Aug. 1984.CrossRefGoogle Scholar
  19. [19]
    M. Kaashoek, A. Tanenbaum, S. Hummel, and H. Bal, “An efficient reliable broadcast protocol,” Operating Systems Review. vol. 23, pp. 5–19, Oct. 1989.CrossRefGoogle Scholar
  20. [20]
    H. Garcia-Molina and A. Spauster, “Ordered and reliable multicast communication,” ACM Trans. on Computer Systems, vol. 9, pp. 242–271, Aug. 1991.CrossRefGoogle Scholar
  21. [21]
    S. Cannon and D. Dunn, “A high-level model for the development of fault-tolerant parallel and distributed systems,” Technical report A0192, Dept. of Computer Science, Utah State Univ., 1992.Google Scholar
  22. [22]
    S. Kambhatla, Replication Issues for a Distributed and Highly Available Linda Tuple Space. M.S. Thesis, Dept. of Computer Science, Oregon Graduate Institute, 1991.Google Scholar
  23. [23]
    L. Patterson, R. Turner, R. Hyatt, and K. Reilly, “Construction of a fault-tolerant distributed tuple-space,” in Proc. 1993 ACM Symp. on Applied Computing, pp. 279–285, Feb. 1993.Google Scholar
  24. [24]
    A. Xu and B. Liskov, “A design for a fault-tolerant distributed implementation of Linda,” in Proc. 19th Fault-Tolerant Computing Symposium, Chicago, IL, pp. 199–206, June 1989Google Scholar
  25. [25]
    B. Liskov, “Distributed programming in Argus,” Commun. ACM, vol. 31, pp. 300–312, March 1988.CrossRefGoogle Scholar
  26. [26]
    M. Herlihy and J. Wing, “Avalon: Language support for reliable distributed systems,” in Proc 17th Symp. on Fault-Tolerant Computing, Pittsburgh, PA, pp. 89–94, July 1987.Google Scholar
  27. [27]
    C. Ellis, J. Feldman, and J. Heliotis, “Language constructs and support systems for distributed computing,” in Proc. 1st ACM Symp. on Principles of Distributed Computing, Ottawa, Canada, pp. 1–9, Aug. 1982.Google Scholar
  28. [28]
    A. Spector, D. Daniels, D. Duchamp, J. Eppinger, and R. Pausch, “Distributed transactions for reliable systems,” in Proc. 10th ACM Symp. on Operating Systems Principles, Orcas Island, WA, pp. 127–146, Dec. 1985.Google Scholar
  29. [29]
    H. Madduri, “Fault-tolerant distributed computing,” Scientific Honeyweller, pp. 1–10, Winter 1986–87.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Richard D. Schlichting
    • 1
  • David E. Bakken
    • 1
  • Vicraj T. Thomas
    • 2
  1. 1.Dept. of Computer ScienceUniv. of ArizonaTucson
  2. 2.Honeywell Technology CenterMinneapolis

Personalised recommendations