Skip to main content

Adaptable fault tolerance for distributed process control using exclusively standard components

  • Session 1 Distributed Fault Tolerance
  • Conference paper
  • First Online:
Dependable Computing — EDCC-2 (EDCC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1150))

Included in the following conference series:

  • 162 Accesses

Abstract

This paper describes an adaptable fault tolerance architecture for distributed process control which uses exclusively standard hardware, standard system software and standard protocols. It offers a quick and low cost solution to provide non-safety critical, technical facilities and plants with continuous service. Thereby a maximum of practicability for the application engineers is achieved. The architecture is composed from well known fault tolerance methods under the constraints of real-time requirements. The latitude of non-safety critical applications is carefully used to minimize the fault tolerance overhead. Because of the transparency of the fault tolerance each functional part of the process control, which is represented by an application task, can be implemented without regard to non-determinism and executing hosts. The configuration of a control system is easy and simply done by naming hosts, tasks and groups in a file, wherein every individual task has to be declared with the selected fault tolerance strategy.

It can be expected by a fault-tolerant system that reconfiguration, following a fault, is done automatically. The present system does more: it reintegrates repaired hosts automatically and re-establishes the redundant operation, while the entire system is working.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K.Birman, R.Cooper, K.Marzullo: ISIS and META Projects Progress Report, 1990

    Google Scholar 

  2. K.P. Birman: Reliable Enterprise Computing Systems, Lecture Notes in Comp. Science: HW and SW Architectures for Fault Tolerance, Springer 1994, pp. 140–150

    Google Scholar 

  3. S.K. Shrivastava, G.N. Dixon, G.D. Parrington: An Overview of the Arjuna Distributed Programming System, IEEE Software, pp. 66–73, January 1991

    Google Scholar 

  4. S.K. Shrivastava: Arjuna and Voltan: Case Studies in Building Fault-Tolerant Distributed Systems Using Standard Components, Lecture Notes in Comp. Science: HW and SW Architectures for Fault Tolerance, Springer 1994, pp. 218–226

    Google Scholar 

  5. D. Powell (Editor): Delta-4: A generic Architecture for Dependable Distributed Computing. Research Reports ESPRIT, Project 818/2252, Springer 1991

    Google Scholar 

  6. J. Bohne: Task-specific Fault Tolerance for Distributed Control Systems, Daimler-Benz Technical Report, May 1992

    Google Scholar 

  7. J.C. Laprie (ed.) IFIP WG 10.4 (Dependable Computing and Fault Tolerance): Dependability: Basic Concepts and Terminology, Springer 1992

    Google Scholar 

  8. F.B. Schneider: Byzantine Generals in Action: Implementing Fail-Stop Processors, ACM Transaction on Computer Systems, Vol.2, No.2, 5/1984

    Google Scholar 

  9. H. Madeira, G. Quadros, J. Gabriel: Experimental Evaluation of a Set of Simple Error Detection Techniques, Microprocessing and Microprogramming No.30, 1990

    Google Scholar 

  10. J.G. Silva, L.M. Silva, H. Madeira, J. Bernardino: Experimental Evaluation of the Fail-Silent Behavior in Computers Without Error Masking, FTCS-24, June 1994

    Google Scholar 

  11. J. Karlson, P.Folkesson, J. Arlat, Y. Crouzet, G. Leber: Integration and Comparison of Three Physical Fault Injection Techniques, Esprit Basic Research Series: Predictably Dependable Computing Systems 1994

    Google Scholar 

  12. H. Kopetz, H. Kants, G. Grünsteidl, P. Puschner, J. Reisinger: Tolerating Transient Faults in MARS, FTCS-20, June 1990

    Google Scholar 

  13. J.G. Silva, L.M. Silva, H. Madeira, J. Bernadino: A Fault-Tolerant Mechanism for Simple Controllers, Dependable Computing — EDCC-1, October 1994

    Google Scholar 

  14. L. Lamport: Time, clocks, and the ordering of events in a distributed system, Communications ACM 21, 7/1978

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrzej Hlawiczka João Gabriel Silva Luca Simoncini

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bohne, J., Grönberg, R. (1996). Adaptable fault tolerance for distributed process control using exclusively standard components. In: Hlawiczka, A., Silva, J.G., Simoncini, L. (eds) Dependable Computing — EDCC-2. EDCC 1996. Lecture Notes in Computer Science, vol 1150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61772-8_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-61772-8_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61772-3

  • Online ISBN: 978-3-540-70677-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics