Advertisement

Distributed Computing

, Volume 32, Issue 1, pp 1–25 | Cite as

Automation of fault-tolerant graceful degradation

  • Yiyan Lin
  • Sandeep KulkarniEmail author
  • Arshad Jhumka
Article
  • 357 Downloads

Abstract

Traditionally, (nonmasking and masking) fault-tolerance has focused on ensuring that after the occurrence of faults, the program recovers to states from where it continues to satisfy its original specification. However, a problem with this limited notion is that, in some cases, it may be impossible to recover to states from where the entire original specification is satisfied. For this reason, one can consider a fault-tolerant graceful-degradation program that ensures that upon the occurrence of faults, the program recovers to states from where a (given) subset of its specification is satisfied. Typically, the subset of specification satisfied thus would be the critical/important requirements. In this paper, we initially focus on automatically revising a given fault-intolerant program into a fault-tolerant gracefully degrading program. Specifically, we propose a two-step approach: In the first step, we transform the fault-intolerant program into a graceful program. This program is guaranteed to satisfy only the given subset of specification (e.g., critical requirements). In particular, this step involves adding new behaviors that will satisfy the given subset of the specification. The second step involves utilizing the original program and the graceful program to obtain a fault-tolerant gracefully degrading program. We also develop an algorithm to transform the gracefully degrading program into a distributed gracefully degrading program. Afterwards, the second phase of our transformation can be applied to generate a distributed fault-tolerant gracefully degrading program. We showcase the algorithm with three different non-trivial case studies. Finally, we formalize the problem of multi-graceful degradation and propose an algorithm that solves it and we use a complex case study to showcase the viability of the approach. All the algorithms have polynomial time complexity in the size of the state space of the original program.

Notes

Acknowledgements

This work is supported by NSF CNS 1329807, NSF CNS 1318678, and XPS 1533802.

References

  1. 1.
    Bonakdarpour, B., Kulkarni, S.S.: Exploiting symbolic techniques in automated synthesis of distributed programs. In: IEEE International Conference on Distributed Computing Systems, pp. 3–10 (2007)Google Scholar
  2. 2.
    Abujarad, F., Kulkarni, S.: Constraint based automated synthesis of nonmasking and stabilizing fault-tolerance. In: Reliable Distributed Systems, 2009. SRDS ’09. 28th IEEE International Symposium on, Sept. 2009, pp. 119 –128 (2009)Google Scholar
  3. 3.
    Bartocci, E., Grosu, R., Katsaros, P., Ramakrishnan, C.R., Smolka, S.A.: Model repair for probabilistic systems. In: TACAS, pp. 326–340 (2011)Google Scholar
  4. 4.
    Herlihy, M., Wing, J.M.: Specifying graceful degradation. IEEE Trans. Parallel Distrib. Syst. 2(1), 93–104 (1991)CrossRefGoogle Scholar
  5. 5.
    Kulkarni, S.S., Arora, A.: Automating the addition of fault-tolerance. In: Formal Techniques in Real-Time and Fault-Tolerant Systems (FTRTFT), pp. 82–93 (2000)Google Scholar
  6. 6.
    Leal, W., McCreery, M., Faria, D.: The OCRC fuel cell lab safety system: a self-stabilizing safety-critical system. In: Défago, X., Petit, F., Villain, V. (eds.) Stabilization, Safety, and Security of Distributed Systems, ser. Lecture Notes in Computer Science. Springer, Berlin, 2011, vol. 6976, pp. 326–340. [Online].  https://doi.org/10.1007/978-3-642-24550-3_25 (2011)
  7. 7.
    Alpern, B., Schneider, F.B.: Defining liveness. Inf. Process. Lett. 21(4), 181–185 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dijkstra, E.W.: Self stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643–644 (1974)CrossRefzbMATHGoogle Scholar
  9. 9.
    Tahat, A., Ebnenasir, A.: A hybrid method for the verification and synthesis of parameterized self-stabilizing protocols. In: Proceedings LOPSTR, pp. 201–218 (2014)Google Scholar
  10. 10.
    Klinkhamer, A., Ebnenasir, A.: Verifying livelock freedom on parameterized rings and chains. In: Proceedings Stabilization, Safety and Security of Distributed Systems, pp. 163–177 (2013)Google Scholar
  11. 11.
    Zruba, G.V., Chlamtac, I., Das, S.K.: A Prioritized Real-TimeWireless Call Degradation Framework for Optimal Call Mix Selection. Kluwer, Dordrecht (2002)Google Scholar
  12. 12.
    Kulkarni, S.S., Ebnenasir, A.: Complexity issues in automated synthesis of failsafe fault-tolerance. IEEE Trans. Dependable Secur. Comput. 2(3), 201–215 (2005)CrossRefGoogle Scholar
  13. 13.
    Gärtner, F.C., Jhumka, A.: Automating the addition of fail-safe fault-tolerance: Beyond fusion-closed specifications. In: Lakhnech, Y., Yovine, S. (eds.) FORMATS/FTRTFT, ser. Lecture Notes in Computer Science, vol. 3253. Springer, Berlin, pp. 183–198 (2004)Google Scholar
  14. 14.
    Lamport, L., Shostak, R.E., Pease, M.C.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)CrossRefzbMATHGoogle Scholar
  15. 15.
    Leal, W., McCreery, M., Faria, D.: The OCRC fuel cell lab safety system: a self-stabilizing safety-critical system. In: Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, ser. SSS’11. Berlin: Springer, pp. 326–340. [Online]. http://dl.acm.org/citation.cfm?id=2050613.2050638 (2011)
  16. 16.
    Ramadge, P.J., Wonham, W.M.: The control of discrete event systems. Proc. IEEE 77(1), 81–98 (1989)CrossRefGoogle Scholar
  17. 17.
    Cho, K.H., Lim, J.T.: Synthesis of fault-tolerant supervisor for automated manufacturing systems: a case study on photolithography process. IEEE Trans. Robot. Autom. 14(2), 348–351 (1998)CrossRefGoogle Scholar
  18. 18.
    Girault, A., Rutten, É.: Automating the addition of fault tolerance with discrete controller synthesis. Formal Methods Syst. Des. 35(2), 190–225 (2009)CrossRefzbMATHGoogle Scholar
  19. 19.
    Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Principles of Programming Languages (POPL), pp. 179–190 (1989)Google Scholar
  20. 20.
    Jobstmann, B., Griesmayer, A., Bloem, R.: Program repair as a game. In: Conference on Computer Aided Verification (CAV), pp. 226–238, LNCS 3576 (2005)Google Scholar
  21. 21.
    Thomas, W.: On the synthesis of strategies in infinite games. In: Theoretical Aspects of Computer Science (STACS), pp. 1–13 (1995)Google Scholar
  22. 22.
    Thomas, W.: Handbook of Theoretical Computer Science: Chapter 4, Automata on Infinite Objects. Elsevier Science Publishers B.V. (1990)Google Scholar
  23. 23.
    Bonakdarpour, B., Abujarad, S., Kulkarni, S.S.: Symbolic synthesis of masking fault-tolerant distributed programs. Distrib. Comput. 25(1), 83–108 (2012)CrossRefzbMATHGoogle Scholar
  24. 24.
    Faghih, F., Bonakdarpour, B.: Smt-based synthesis of distributed self-stabilizing systems. Trans. Adapt. Auton. Syst. 10(3), 1–26 (2015)CrossRefGoogle Scholar
  25. 25.
    Bonakdarpour, B., Kulkarni, S., Abujarad, F.: Symbolic synthesis of masking fault-tolerant distributed programs. Distrib. Comput. 25(1), 83–108 (2012)CrossRefzbMATHGoogle Scholar
  26. 26.
    Chen, J., Kulkarni, S.S.: Mr4um: a framework for adding fault tolerance to uml state diagrams. Theoret. Comput. Sci. 496, 17–33 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Hajisheykhi, R., Ebnenasir, A., Kulkarni, S.S.: Evaluating the effect of faults in systemC TLM models using UPPAAL. In: Proceedings SEFM, pp. 175–189 (2014)Google Scholar
  28. 28.
    Hajisheykhi, R., Ebnenasir, A., Kulkarni, S.S.: UFIT: A tool for modeling faults in UPPAAL timed automata. In: Proceedings NFM, pp. 429–435 (2015)Google Scholar
  29. 29.
    Randell, B.: System structure for software fault tolerance. IEEE Trans. Softw. Eng., 1(2), 221–232 (1975). [Online].  https://doi.org/10.1109/TSE.1975.6312842
  30. 30.
    Randell, B., Romanovsky, A., Rubira, C.M.F., Stroud, R.J., Wu, Z., Xu, J.: From recovery blocks to concurrent atomic actions. Springer, Berlin, pp. 87–101. (1995) [Online].  https://doi.org/10.1007/978-3-642-79789-7_6
  31. 31.
    Ebnenasir, A., Kulkarni, S.: Feasibility of stepwise design of multitolerant programs. ACM Trans. Softw. Eng. Methodol. 21(1), 1–49 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA
  2. 2.Department of Computer ScienceUniversity of WarwickCoventryUK

Personalised recommendations