Advertisement

Constructing Dependable Distributed Systems Using Consul

  • Richard D. Schlichting
  • Shivakant Mishra
  • Larry L. Peterson
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 285)

Abstract

Constructing the software for a distributed system that can continue to provide dependable service despite failures is a complex task. Consul is a communication substrate that simplifies this task by providing a collection of fundamental abstractions for implementing replicated processing. These include provisions for transmitting messages atomically and in some consistent order to a group of processes (atomic multicast), for detecting failures and agreeing on the resulting system composition (membership), and for reestablishing a consistent process state following failure (recovery). This chapter outlines the features provided by Consul and its implementation using the x-kernel.

Keywords

State Machine Stable Storage Recovery Service Logical Time Recovery Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    F. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Computing Surveys, vol. 22, pp. 299–319, Dec. 1990.Google Scholar
  2. [2]
    N. Hutchinson and L. Peterson, “The x-kernel: An architecture for implementing network protocols,” IEEE Trans. on Software Engineering, vol. SE-17, pp. 64–76, Jan. 1991.Google Scholar
  3. [3]
    F. Cristian, “Understanding fault-tolerant distributed systems,” Commun. ACM, vol. 34, pp. 56–78, Feb. 1991.Google Scholar
  4. [4]
    L. Lamport, “Time, clocks, and the ordering of events in a distributed systems,” Commun. ACM, vol. 21, pp. 558–565, July 1978.Google Scholar
  5. [5]
    K. Birman and T. Joseph, “Reliable communication in the presence of failures,” ACM Trans. on Computer Systems, vol. 5, pp. 47–76, Feb. 1987.Google Scholar
  6. [6]
    K. Birman, A. Schiper, and P. Stephenson, “Lightweight causal and atomic group multicast,” ACM Trans. on Computer Systems, vol. 9, pp. 272–314, Aug. 1991.Google Scholar
  7. [7]
    F. Cristian, B. Dancey, and J. Dehn, “Fault-tolerance in the Advanced Automation System,” in Proc. 20th Symp. on Fault-Tolerant Computing, Newcastle-upon-Type, UK, pp. 6–17, June 1990.Google Scholar
  8. [8]
    H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabl, C. Senft, and R. Zainlinger, “Distributed fault-tolerant real-time systems: The Mars approach,” IEEE Micro, pp. 25–40, Feb. 1989.Google Scholar
  9. [9]
    D. Powell, ed., Delta-4: A Generic Architecture for Dependable Computing, Research Reports ESPRIT, Vol. 1, Springer-Verlag, 1991.Google Scholar
  10. [10]
    S. Mishra and R. Schlichting, “Abstractions for constructing dependable distributed systems,” Technical report 92-19, Dept. of Computer Science, University of Arizona, 1992.Google Scholar
  11. [11]
    B. Lampson, “Atomic transactions,” in Distributed Systems-Architecture and Implementation (B. Lampson, M. Paul, and H. Seigert, eds.), ch. 11, pp. 246–265, Springer-Verlag, Berlin, 1981.Google Scholar
  12. [12]
    L. Peterson, N. Buchholz, and R. Schlichting, “Preserving and using context information in interprocess communication,” ACM Trans. on Computer Systems, vol. 7, pp. 217–246, Aug. 1989.Google Scholar
  13. [13]
    F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic broadcast: From simple message diffusion to Byzantine agreement,” in Proc. 15th Symp. on Fault-Tolerant Computing, Ann Arbor, MI, pp. 200–206, June 1985.Google Scholar
  14. [14]
    M. Kaashoek, A. Tanenbaum, S. Hummel, and H. Bal, “An efficient reliable broadcast protocol,” Operating Systems Review, vol. 23, pp. 5–19, Oct. 1989.Google Scholar
  15. [15]
    P. Melliar-Smith and L. Moser, “Fault-tolerant distributed systems based on broadcast communication,” in Proc. 9th Conf. on Distributed Computing Systems, Newport Beach, CA, pp. 129–134, June 1989.Google Scholar
  16. [16]
    P. Verissimo, L. Rodrigues, and M. Baptista, “AMp: A highly parallel atomic multicast protocol,” in Proc. SIGCOMM’ 89, Austin, TX, pp. 83–93. Sept. 1989.Google Scholar
  17. [17]
    S. Mishra, L. Peterson, and R. Schlichting, “Consul: A communication substrate for fault-tolerant distributed programs,” Distributed Systems Engineering, vol. 1, pp. 87–103, 1993.CrossRefGoogle Scholar
  18. [18]
    D. Johnson and W. Zwaenepoel, “Sender based message logging,” in Proc. 17th Sxmp. on Fault-Tolerant Computing, Pittsburgh, PA, pp. 14–19, July 1987.Google Scholar
  19. [19]
    D. Bakken and R. Schlichting, “Supporting fault-tolerant parallel programming in Linda,” IEEE Trans. on Parallel and Distributed Systems, to appear. 1994.Google Scholar
  20. [20]
    K. Birman, T. Joseph, T. Raeuchle, and A. El Abbadi, “Implementing fault-tolerant distributed objects,” IEEE Trans. on Software Engineering, vol. SE-11, pp. 502–508, June 1985.Google Scholar
  21. [21]
    A. Birrell, R. Levin, R. Needham, and M. Schroeder, “Grapevine: An exercise in distributed computing,” Commun. ACM, vol. 25, pp. 260–274, Apr. 1982.Google Scholar
  22. [22]
    B. Oki and B. Liskov, “Viewstamped replication: A new primary copy method to support highly-available distributed systems,” in Proc. 7th ACM Symp. on Principles of Distributed Computing, Toronto, Canada, pp. 8–17, Aug. 1988.Google Scholar
  23. [23]
    D. Daniels and A. Spector, “An algorithm for replicated directories,” in Proc. 2nd ACM Symp. on Principles of Distributed Computing, Montreal. Canada, pp. 104–113 Aug. 1983.Google Scholar
  24. [24]
    M. Herlihy, “Extending multiversion time-stamping protocols to exploit type information,” IEEE Trans. on Computers, vol. C-36, pp. 443–448. Apr. 1987.Google Scholar
  25. [25]
    R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat, “Providing high availability using lazy replication,” ACM Trans. on Computer Systems, vol. 10, pp. 360–391, Nov. 1992.Google Scholar
  26. [26]
    J. Chang and N. Maxemchuk, “Reliable broadcast protocols,” ACM Trans. on Computer Systems, vol. 2, pp. 251–273, Aug. 1984.Google Scholar
  27. [27]
    A. Ricciardi and K. Birman, “Using process groups to implement failure detection in asynchronous environments,” in Proc. 10th ACM Symp. on Principles of Distributed Computing, Montreal, Canada, pp. 341–353, Aug. 1991.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Richard D. Schlichting
    • 1
  • Shivakant Mishra
    • 2
  • Larry L. Peterson
    • 1
  1. 1.Dept. of Computer ScienceUniv. of ArizonaTucson
  2. 2.Dept. of Computer Science and Eng.Univ. of California San DiegoLa Jolla

Personalised recommendations