Foundations of Dependable Computing pp 243-263 | Cite as
Constructing Dependable Distributed Systems Using Consul
- 36 Downloads
Abstract
Constructing the software for a distributed system that can continue to provide dependable service despite failures is a complex task. Consul is a communication substrate that simplifies this task by providing a collection of fundamental abstractions for implementing replicated processing. These include provisions for transmitting messages atomically and in some consistent order to a group of processes (atomic multicast), for detecting failures and agreeing on the resulting system composition (membership), and for reestablishing a consistent process state following failure (recovery). This chapter outlines the features provided by Consul and its implementation using the x-kernel.
Keywords
State Machine Stable Storage Recovery Service Logical Time Recovery ProtocolPreview
Unable to display preview. Download preview PDF.
References
- [1]F. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Computing Surveys, vol. 22, pp. 299–319, Dec. 1990.Google Scholar
- [2]N. Hutchinson and L. Peterson, “The x-kernel: An architecture for implementing network protocols,” IEEE Trans. on Software Engineering, vol. SE-17, pp. 64–76, Jan. 1991.Google Scholar
- [3]F. Cristian, “Understanding fault-tolerant distributed systems,” Commun. ACM, vol. 34, pp. 56–78, Feb. 1991.Google Scholar
- [4]L. Lamport, “Time, clocks, and the ordering of events in a distributed systems,” Commun. ACM, vol. 21, pp. 558–565, July 1978.Google Scholar
- [5]K. Birman and T. Joseph, “Reliable communication in the presence of failures,” ACM Trans. on Computer Systems, vol. 5, pp. 47–76, Feb. 1987.Google Scholar
- [6]K. Birman, A. Schiper, and P. Stephenson, “Lightweight causal and atomic group multicast,” ACM Trans. on Computer Systems, vol. 9, pp. 272–314, Aug. 1991.Google Scholar
- [7]F. Cristian, B. Dancey, and J. Dehn, “Fault-tolerance in the Advanced Automation System,” in Proc. 20th Symp. on Fault-Tolerant Computing, Newcastle-upon-Type, UK, pp. 6–17, June 1990.Google Scholar
- [8]H. Kopetz, A. Damm, C. Koza, M. Mulazzani, W. Schwabl, C. Senft, and R. Zainlinger, “Distributed fault-tolerant real-time systems: The Mars approach,” IEEE Micro, pp. 25–40, Feb. 1989.Google Scholar
- [9]D. Powell, ed., Delta-4: A Generic Architecture for Dependable Computing, Research Reports ESPRIT, Vol. 1, Springer-Verlag, 1991.Google Scholar
- [10]S. Mishra and R. Schlichting, “Abstractions for constructing dependable distributed systems,” Technical report 92-19, Dept. of Computer Science, University of Arizona, 1992.Google Scholar
- [11]B. Lampson, “Atomic transactions,” in Distributed Systems-Architecture and Implementation (B. Lampson, M. Paul, and H. Seigert, eds.), ch. 11, pp. 246–265, Springer-Verlag, Berlin, 1981.Google Scholar
- [12]L. Peterson, N. Buchholz, and R. Schlichting, “Preserving and using context information in interprocess communication,” ACM Trans. on Computer Systems, vol. 7, pp. 217–246, Aug. 1989.Google Scholar
- [13]F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic broadcast: From simple message diffusion to Byzantine agreement,” in Proc. 15th Symp. on Fault-Tolerant Computing, Ann Arbor, MI, pp. 200–206, June 1985.Google Scholar
- [14]M. Kaashoek, A. Tanenbaum, S. Hummel, and H. Bal, “An efficient reliable broadcast protocol,” Operating Systems Review, vol. 23, pp. 5–19, Oct. 1989.Google Scholar
- [15]P. Melliar-Smith and L. Moser, “Fault-tolerant distributed systems based on broadcast communication,” in Proc. 9th Conf. on Distributed Computing Systems, Newport Beach, CA, pp. 129–134, June 1989.Google Scholar
- [16]P. Verissimo, L. Rodrigues, and M. Baptista, “AMp: A highly parallel atomic multicast protocol,” in Proc. SIGCOMM’ 89, Austin, TX, pp. 83–93. Sept. 1989.Google Scholar
- [17]S. Mishra, L. Peterson, and R. Schlichting, “Consul: A communication substrate for fault-tolerant distributed programs,” Distributed Systems Engineering, vol. 1, pp. 87–103, 1993.CrossRefGoogle Scholar
- [18]D. Johnson and W. Zwaenepoel, “Sender based message logging,” in Proc. 17th Sxmp. on Fault-Tolerant Computing, Pittsburgh, PA, pp. 14–19, July 1987.Google Scholar
- [19]D. Bakken and R. Schlichting, “Supporting fault-tolerant parallel programming in Linda,” IEEE Trans. on Parallel and Distributed Systems, to appear. 1994.Google Scholar
- [20]K. Birman, T. Joseph, T. Raeuchle, and A. El Abbadi, “Implementing fault-tolerant distributed objects,” IEEE Trans. on Software Engineering, vol. SE-11, pp. 502–508, June 1985.Google Scholar
- [21]A. Birrell, R. Levin, R. Needham, and M. Schroeder, “Grapevine: An exercise in distributed computing,” Commun. ACM, vol. 25, pp. 260–274, Apr. 1982.Google Scholar
- [22]B. Oki and B. Liskov, “Viewstamped replication: A new primary copy method to support highly-available distributed systems,” in Proc. 7th ACM Symp. on Principles of Distributed Computing, Toronto, Canada, pp. 8–17, Aug. 1988.Google Scholar
- [23]D. Daniels and A. Spector, “An algorithm for replicated directories,” in Proc. 2nd ACM Symp. on Principles of Distributed Computing, Montreal. Canada, pp. 104–113 Aug. 1983.Google Scholar
- [24]M. Herlihy, “Extending multiversion time-stamping protocols to exploit type information,” IEEE Trans. on Computers, vol. C-36, pp. 443–448. Apr. 1987.Google Scholar
- [25]R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat, “Providing high availability using lazy replication,” ACM Trans. on Computer Systems, vol. 10, pp. 360–391, Nov. 1992.Google Scholar
- [26]J. Chang and N. Maxemchuk, “Reliable broadcast protocols,” ACM Trans. on Computer Systems, vol. 2, pp. 251–273, Aug. 1984.Google Scholar
- [27]A. Ricciardi and K. Birman, “Using process groups to implement failure detection in asynchronous environments,” in Proc. 10th ACM Symp. on Principles of Distributed Computing, Montreal, Canada, pp. 341–353, Aug. 1991.Google Scholar