Advertisement

Patterns for Light-Weight Fault Tolerance and Decoupled Design in Distributed Control Systems

  • Pekka AlhoEmail author
  • Jari Rauhamäki
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10600)

Abstract

Distributed control systems comprise networked computing units that monitor and control physical processes in feedback loops. Reliability of these systems is affected by dynamic and complex computing environments where connections and system configurations may change rapidly. Diverse redundancy can be effective in improving system dependability, but it is susceptible to common mode failures and development costs for design diversity are often seen as prohibitive. In this paper we present three patterns that can be used to provide light-weight form of fault tolerance to improve system dependability and resilience by providing ability to cope with unexpected events and faults. These patterns are presented together with a pattern language that shows how they relate to other fault tolerance patterns.

Keywords

Dependability Distributed systems Fault tolerance Real-time systems Reliability 

Notes

Acknowledgements

Authors would like to thank reviewers and VikingPLoP 2013 participants for the feedback and providing valuable comments, Robert Hanmer for shepherding the paper and VikingPLoP organizers for a great pattern conference. This work was carried out under the EFDA Goal Oriented Training Programme (WP10-GOT-GOTRH) and financial support of TEKES, which are greatly acknowledged. The views and opinions expressed herein do not necessarily reflect those of the European Commission.

References

  1. 1.
    Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. Trans. Dependable Secure Comput. 1(1), 11–33 (2004)CrossRefGoogle Scholar
  2. 2.
    Armstrong, J.: Making Reliable Distributed Systems in the Presence of Software Errors. Royal Institute of Technology, Stockholm (2003)Google Scholar
  3. 3.
    Dunn, W.: Practical Design of Safety-Critical Computer Systems. Reliability Press, Solvang (2002)Google Scholar
  4. 4.
    Knight, J., Leveson, N.: An experimental evaluation of the assumption of independence in multi-version programming. Trans. Softw. Eng. 12, 96–109 (1986)CrossRefGoogle Scholar
  5. 5.
    Herder, J.: Building a Dependable Operating System: Fault Tolerance in MINIX 3. Vrije Universiteit. USENIX Association, Netherlands (2010)Google Scholar
  6. 6.
    Hanmer, R.: Patterns for Fault Tolerant Software. Wiley, Hoboken (2007)Google Scholar
  7. 7.
    Eloranta, V.-P., Koskinen, J., Leppänen, M., Reijonen, V.: A pattern language for distributed machine control systems. Department of Software Systems, Tampere University of Technology (2010)Google Scholar
  8. 8.
    Buschmann, F., Henney, K., Schmidt, D.: Pattern Oriented Software Architecture: A Pattern Language for Distributed Computing. Wiley, Hoboken (2007)Google Scholar
  9. 9.
    Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston (1995)zbMATHGoogle Scholar
  10. 10.
    EventHelix.com Inc.: Manager Design Pattern. http://www.eventhelix.com/realtimemantra/ManagerDesignPattern.htm#.UOQm6kUbR8E. Accessed 2 Jan 2013
  11. 11.
    Candea, G., Fox, A.: Crash-only software. In: Proceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems (2003)Google Scholar
  12. 12.
    Eloranta, V.-P.: Event notification patterns for distributed machine control systems. In: Proceedings of VikingPLoP 2012 Conference. Department of Software Systems, Tampere University of Technology (2012)Google Scholar
  13. 13.
    Erlang/OTP R16A documentation. http://www.erlang.org/doc/. Accessed 13 Feb 2013
  14. 14.
    Hanmer, R.: Software rejuvenation. In: Proceedings of 17th Conference on Pattern Languages of Programs. ACM (2010)Google Scholar
  15. 15.
    Pinho, L., Vasques, F.: Replica management in real-time Ada 95 applications. In: Proceedings of the 9th International Workshop on Real-time Ada. ACM (1999)Google Scholar
  16. 16.
    Jain, P., Schmidt, D.: Dynamically configuring communication services with the service configurator pattern. C++ report, June issue (1997)Google Scholar
  17. 17.
    NASA Jet Propulsion Laboratory: Curiosity Out of Safe Mode. http://www.jpl.nasa.gov/news/news.php?release=2013-330. Accessed 19 Dec 2013

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Intelligent Hydraulics and AutomationTampere University of TechnologyTampereFinland
  2. 2.Department of Automation Science and EngineeringTampere University of TechnologyTampereFinland

Personalised recommendations