Abstract
An adaptive computing system is one that modifies its behavior based on changes in the environment. Since one common type of environment change in a distributed system is network or processor failure, fault-tolerant distributed systems can be viewed as an important subclass of adaptive systems. As such, use of adaptive methods for dealing with failures in this context has the same potential advantages of improved efficiency and structural simplicity as for adaptive systems in general. This paper describes a model for adaptive systems that can be applied in many failure scenarios arising in distributed systems. This model divides the adaptation process into three different phases—change detection, agreement, and action—that can be used as a common means for describing various fault-tolerance algorithms such as reliable transmission and membership protocols. This serves not only to clarify the logical structure and relationship of such algorithms, but also to provide a unifying implementation framework. Several adaptive fault-tolerant protocols are given as examples. A technique for implementing the model in a distributed system using an event-driven approach for composing protocols in parallel is also presented.
This work supported in part by the National Science Foundation under grant CCR-9003161 and the Office of Naval Research under grant N00014-91-J-1015.
Preview
Unable to display preview. Download preview PDF.
References
B. Bhargava, K. Friesen, A. Helal, J. Riedl: Adaptability Experiments in the RAID Distributed Database System. Proc. of the 9th IEEE Symposium on Reliable Distributed Systems (1990) 76–85
T. Bihari, K. Schwan: Dynamic Adaptation of Real-Time Software. ACM Trans. on Computer Systems, vol. 9, num. 2 (May 1991) 143–174
K. Schwan, T. Bihari, B. Blake: Adaptive, Reliable Software for Distributed and Parallel Real-Time Systems. Proc. of the 6th IEEE Symposium on Reliability in Distributed Software and Database Systems (Mar. 1987) 32–42
D. Schmidt, D. Box, T. Suda: ADAPTIVE: A Dynamically Assembled Protocol Transformation, Integration, and Evaluation Environment. Concurrency: Practice and Experience, vol. 5, num. 4 (June 1993) 269–286
J. Goldberg, I. Greenberg, T. Lawrence: Adaptive Fault Tolerance. Proc. of the IEEE Workshop on Advances in Parallel and Distributed Systems (1993) 127–132
M. Hiltunen, R. Schlichting: An Approach to Constructing Modular Fault-Tolerant Protocols. Proc. of the 12th IEEE Symposium on Reliable Distributed Systems (1993) 105–114
F. Cristian: Reaching Agreement on Processor-Group Membership in Synchronous Distributed Systems. Distributed Computing, vol. 4 (1991) 175–187
H. Kopetz, G. Grunsteidl, J. Reisinger: Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System. In: A. Avizienis, J.C. Laprie (eds.): Dependable Computing for Critical Applications (1991). Vienna: Springer-Verlag, pp. 411–429
S. Mishra, L. Peterson, R.D. Schlichting: A Membership Protocol Based on Partial Order. In: J.F. Meyer, R.D. Schlichting (eds.): Dependable Computing for Critical Applications 2 (1992). Vienna: Springer-Verlag, pp. 309–331
A. Ricciardi, K. Birman: Using Process Groups to Implement Failure Detection in Asynchronous Environments. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 341–353
S. Mishra, L. Peterson, R. Schlichting: Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. Distributed Systems Engineering (to appear 1994).
L. Peterson, N. Buchholz, R. Schlichting: Preserving and Using Context Information in Interprocess Communication. ACM Trans. on Computer Systems, vol. 7, num. 3 (Aug. 1989) 217–246
M. Kaashoek, A. Tanenbaum, S. Hummel, H. Bal: An Efficient Reliable Broadcast Protocol. Operating Systems Review, vol. 23, num. 4 (Oct. 1989) 5–19
K. Marzullo, S. Armstrong, A. Freier: Multicast Transport Protocol. Internet RFC 1301 (1992)
D. Powell: Failure Mode Assumptions and Assumption Coverage. Proc. of the 22nd IEEE Symposium on Fault-Tolerant Computing (1992) 386–395
F. Cristian, H. Aghili, H. Strong, D. Dolev: Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement. Proc. of the 15th IEEE Symposium on Fault-Tolerant Computing (1985) 200–206
A. Gopal, S. Toueg: Inconsistency and Contamination. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 257–272
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hiltunen, M.A., Schlichting, R.D. (1994). A model for adaptive fault-tolerant systems. In: Echtle, K., Hammer, D., Powell, D. (eds) Dependable Computing — EDCC-1. EDCC 1994. Lecture Notes in Computer Science, vol 852. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58426-9_121
Download citation
DOI: https://doi.org/10.1007/3-540-58426-9_121
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58426-1
Online ISBN: 978-3-540-48785-2
eBook Packages: Springer Book Archive