Skip to main content

A model for adaptive fault-tolerant systems

  • Session 1: Fault-tolerance techniques
  • Conference paper
  • First Online:
Book cover Dependable Computing — EDCC-1 (EDCC 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 852))

Included in the following conference series:

  • 143 Accesses

Abstract

An adaptive computing system is one that modifies its behavior based on changes in the environment. Since one common type of environment change in a distributed system is network or processor failure, fault-tolerant distributed systems can be viewed as an important subclass of adaptive systems. As such, use of adaptive methods for dealing with failures in this context has the same potential advantages of improved efficiency and structural simplicity as for adaptive systems in general. This paper describes a model for adaptive systems that can be applied in many failure scenarios arising in distributed systems. This model divides the adaptation process into three different phases—change detection, agreement, and action—that can be used as a common means for describing various fault-tolerance algorithms such as reliable transmission and membership protocols. This serves not only to clarify the logical structure and relationship of such algorithms, but also to provide a unifying implementation framework. Several adaptive fault-tolerant protocols are given as examples. A technique for implementing the model in a distributed system using an event-driven approach for composing protocols in parallel is also presented.

This work supported in part by the National Science Foundation under grant CCR-9003161 and the Office of Naval Research under grant N00014-91-J-1015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Bhargava, K. Friesen, A. Helal, J. Riedl: Adaptability Experiments in the RAID Distributed Database System. Proc. of the 9th IEEE Symposium on Reliable Distributed Systems (1990) 76–85

    Google Scholar 

  2. T. Bihari, K. Schwan: Dynamic Adaptation of Real-Time Software. ACM Trans. on Computer Systems, vol. 9, num. 2 (May 1991) 143–174

    Article  Google Scholar 

  3. K. Schwan, T. Bihari, B. Blake: Adaptive, Reliable Software for Distributed and Parallel Real-Time Systems. Proc. of the 6th IEEE Symposium on Reliability in Distributed Software and Database Systems (Mar. 1987) 32–42

    Google Scholar 

  4. D. Schmidt, D. Box, T. Suda: ADAPTIVE: A Dynamically Assembled Protocol Transformation, Integration, and Evaluation Environment. Concurrency: Practice and Experience, vol. 5, num. 4 (June 1993) 269–286

    Google Scholar 

  5. J. Goldberg, I. Greenberg, T. Lawrence: Adaptive Fault Tolerance. Proc. of the IEEE Workshop on Advances in Parallel and Distributed Systems (1993) 127–132

    Google Scholar 

  6. M. Hiltunen, R. Schlichting: An Approach to Constructing Modular Fault-Tolerant Protocols. Proc. of the 12th IEEE Symposium on Reliable Distributed Systems (1993) 105–114

    Google Scholar 

  7. F. Cristian: Reaching Agreement on Processor-Group Membership in Synchronous Distributed Systems. Distributed Computing, vol. 4 (1991) 175–187

    Article  Google Scholar 

  8. H. Kopetz, G. Grunsteidl, J. Reisinger: Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System. In: A. Avizienis, J.C. Laprie (eds.): Dependable Computing for Critical Applications (1991). Vienna: Springer-Verlag, pp. 411–429

    Google Scholar 

  9. S. Mishra, L. Peterson, R.D. Schlichting: A Membership Protocol Based on Partial Order. In: J.F. Meyer, R.D. Schlichting (eds.): Dependable Computing for Critical Applications 2 (1992). Vienna: Springer-Verlag, pp. 309–331

    Google Scholar 

  10. A. Ricciardi, K. Birman: Using Process Groups to Implement Failure Detection in Asynchronous Environments. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 341–353

    Google Scholar 

  11. S. Mishra, L. Peterson, R. Schlichting: Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. Distributed Systems Engineering (to appear 1994).

    Google Scholar 

  12. L. Peterson, N. Buchholz, R. Schlichting: Preserving and Using Context Information in Interprocess Communication. ACM Trans. on Computer Systems, vol. 7, num. 3 (Aug. 1989) 217–246

    Google Scholar 

  13. M. Kaashoek, A. Tanenbaum, S. Hummel, H. Bal: An Efficient Reliable Broadcast Protocol. Operating Systems Review, vol. 23, num. 4 (Oct. 1989) 5–19

    Article  Google Scholar 

  14. K. Marzullo, S. Armstrong, A. Freier: Multicast Transport Protocol. Internet RFC 1301 (1992)

    Google Scholar 

  15. D. Powell: Failure Mode Assumptions and Assumption Coverage. Proc. of the 22nd IEEE Symposium on Fault-Tolerant Computing (1992) 386–395

    Google Scholar 

  16. F. Cristian, H. Aghili, H. Strong, D. Dolev: Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement. Proc. of the 15th IEEE Symposium on Fault-Tolerant Computing (1985) 200–206

    Google Scholar 

  17. A. Gopal, S. Toueg: Inconsistency and Contamination. Proc. of the 10th ACM Symposium on Principles of Distributed Computing (1991) 257–272

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Klaus Echtle Dieter Hammer David Powell

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hiltunen, M.A., Schlichting, R.D. (1994). A model for adaptive fault-tolerant systems. In: Echtle, K., Hammer, D., Powell, D. (eds) Dependable Computing — EDCC-1. EDCC 1994. Lecture Notes in Computer Science, vol 852. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58426-9_121

Download citation

  • DOI: https://doi.org/10.1007/3-540-58426-9_121

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58426-1

  • Online ISBN: 978-3-540-48785-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics