A model for adaptive fault-tolerant systems

Hiltunen, Matti A.; Schlichting, Richard D.

doi:10.1007/3-540-58426-9_121

Matti A. Hiltunen¹ &
Richard D. Schlichting¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 852))

Included in the following conference series:

European Dependable Computing Conference

143 Accesses

Abstract

An adaptive computing system is one that modifies its behavior based on changes in the environment. Since one common type of environment change in a distributed system is network or processor failure, fault-tolerant distributed systems can be viewed as an important subclass of adaptive systems. As such, use of adaptive methods for dealing with failures in this context has the same potential advantages of improved efficiency and structural simplicity as for adaptive systems in general. This paper describes a model for adaptive systems that can be applied in many failure scenarios arising in distributed systems. This model divides the adaptation process into three different phases—change detection, agreement, and action—that can be used as a common means for describing various fault-tolerance algorithms such as reliable transmission and membership protocols. This serves not only to clarify the logical structure and relationship of such algorithms, but also to provide a unifying implementation framework. Several adaptive fault-tolerant protocols are given as examples. A technique for implementing the model in a distributed system using an event-driven approach for composing protocols in parallel is also presented.

This work supported in part by the National Science Foundation under grant CCR-9003161 and the Office of Naval Research under grant N00014-91-J-1015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. Bhargava, K. Friesen, A. Helal, J. Riedl: Adaptability Experiments in the RAID Distributed Database System. Proc. of the 9th IEEE Symposium on Reliable Distributed Systems (1990) 76–85
Google Scholar
T. Bihari, K. Schwan: Dynamic Adaptation of Real-Time Software. ACM Trans. on Computer Systems, vol. 9, num. 2 (May 1991) 143–174
Article Google Scholar
K. Schwan, T. Bihari, B. Blake: Adaptive, Reliable Software for Distributed and Parallel Real-Time Systems. Proc. of the 6th IEEE Symposium on Reliability in Distributed Software and Database Systems (Mar. 1987) 32–42
Google Scholar
D. Schmidt, D. Box, T. Suda: ADAPTIVE: A Dynamically Assembled Protocol Transformation, Integration, and Evaluation Environment. Concurrency: Practice and Experience, vol. 5, num. 4 (June 1993) 269–286
Google Scholar
J. Goldberg, I. Greenberg, T. Lawrence: Adaptive Fault Tolerance. Proc. of the IEEE Workshop on Advances in Parallel and Distributed Systems (1993) 127–132
Google Scholar
M. Hiltunen, R. Schlichting: An Approach to Constructing Modular Fault-Tolerant Protocols. Proc. of the 12th IEEE Symposium on Reliable Distributed Systems (1993) 105–114
Google Scholar
F. Cristian: Reaching Agreement on Processor-Group Membership in Synchronous Distributed Systems. Distributed Computing, vol. 4 (1991) 175–187
Article Google Scholar
H. Kopetz, G. Grunsteidl, J. Reisinger: Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System. In: A. Avizienis, J.C. Laprie (eds.): Dependable Computing for Critical Applications (1991). Vienna: Springer-Verlag, pp. 411–429
Google Scholar
S. Mishra, L. Peterson, R.D. Schlichting: A Membership Protocol Based on Partial Order. In: J.F. Meyer, R.D. Schlichting (eds.): Dependable Computing for Critical Applications 2 (1992). Vienna: Springer-Verlag, pp. 309–331
Google Scholar
A. Ricciardi, K. Birman: Using Process Groups to Implement Failure Detection in Asynchronous Environments. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 341–353
Google Scholar
S. Mishra, L. Peterson, R. Schlichting: Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. Distributed Systems Engineering (to appear 1994).
Google Scholar
L. Peterson, N. Buchholz, R. Schlichting: Preserving and Using Context Information in Interprocess Communication. ACM Trans. on Computer Systems, vol. 7, num. 3 (Aug. 1989) 217–246
Google Scholar
M. Kaashoek, A. Tanenbaum, S. Hummel, H. Bal: An Efficient Reliable Broadcast Protocol. Operating Systems Review, vol. 23, num. 4 (Oct. 1989) 5–19
Article Google Scholar
K. Marzullo, S. Armstrong, A. Freier: Multicast Transport Protocol. Internet RFC 1301 (1992)
Google Scholar
D. Powell: Failure Mode Assumptions and Assumption Coverage. Proc. of the 22nd IEEE Symposium on Fault-Tolerant Computing (1992) 386–395
Google Scholar
F. Cristian, H. Aghili, H. Strong, D. Dolev: Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement. Proc. of the 15th IEEE Symposium on Fault-Tolerant Computing (1985) 200–206
Google Scholar
A. Gopal, S. Toueg: Inconsistency and Contamination. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 257–272
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Arizona, 85721, Tucson, AZ, USA
Matti A. Hiltunen & Richard D. Schlichting

Authors

Matti A. Hiltunen
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Schlichting
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Klaus Echtle Dieter Hammer David Powell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hiltunen, M.A., Schlichting, R.D. (1994). A model for adaptive fault-tolerant systems. In: Echtle, K., Hammer, D., Powell, D. (eds) Dependable Computing — EDCC-1. EDCC 1994. Lecture Notes in Computer Science, vol 852. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58426-9_121

Download citation

DOI: https://doi.org/10.1007/3-540-58426-9_121
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58426-1
Online ISBN: 978-3-540-48785-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics