Experimental Assessment of Cloud Software Dependability Using Fault Injection

  • Lena HerscheidEmail author
  • Daniel Richter
  • Andreas Polze
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 450)


In modern cloud software systems, the complexity arising from feature interaction, geographical distribution, security and configurability requirements increases the likelihood of faults. Additional influencing factors are the impact of different execution environments as well as human operation or configuration errors. Assuming that any non-trivial cloud software system contains faults, robustness testing is needed to ensure that such faults are discovered as early as possible, and that the overall service is resilient and fault tolerant. To this end, fault injection is a means for disrupting the software in ways that uncover bugs and test the fault tolerance mechanisms. In this paper, we discuss how to experimentally assess software dependability in two steps. First, a model of the software is constructed from different runtime observations and configuration information. Second, this model is used to orchestrate fault injection experiments with the running software system in order to quantify dependability attributes such as service availability. We propose the architecture of a fault injection service within the OpenStack project.


Fault injection Dependability Distributed systems Cloud systems Availability 


  1. 1.
    Beizer, B.: Software is Different. Ann. Softw. Eng. 10(1–4), 293–310 (2000)CrossRefzbMATHGoogle Scholar
  2. 2.
    Brewer, E.: CAP twelve years later: How the “rules’’ have changed. Computer 45(2), 23–29 (2012)CrossRefGoogle Scholar
  3. 3.
    Tseitlin, A.: The antifragile organization. Communications of the ACM 56(8) (2013)Google Scholar
  4. 4.
    Allspaw, J.: Fault Injection in Production. Queue 10(8), 30:30–30:35 (2012)Google Scholar
  5. 5.
    Gunawi, H., Do, T., Hellerstein, J., Stoica, I., Borthakur, D., Robbins, J.: Failure as a service (faas): A cloud service for large-scale, online failure drills. University of California, Berkeley (2011)Google Scholar
  6. 6.
    Netflix: Chaos Monkey (accessed 2013).
  7. 7.
    Gunawi, H., Do, T., Joshi, P., Alvaro, P., Hellerstein, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Sen, K., Borthakur, D.: FATE and DESTINI: A framework for cloud recovery testing. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, Berkeley, CA, USA, pp. 238–252 (2011)Google Scholar
  8. 8.
    Joshi, P., Gunawi, H., Sen, K.: PREFAIL: A programmable tool for multiple-failure injection. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, New York, NY, USA, pp. 171–188 (2011)Google Scholar
  9. 9.
    Faghri, F., Bazarbayev, S., Overholt, M., Farivar, R., Campbell, R., Sanders, W.: Failure scenario as a service (FSaaS) for hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management, New York, NY, USA, pp. 5:1–5:6 (2012)Google Scholar
  10. 10.
    Dawson, S., Jahanian, F., Mitton, T.: ORCHESTRA: A fault injection environment for distributed systems. Tech. rep. In: 26th International Symposium on Fault-Tolerant Computing (FTCS) (1996)Google Scholar
  11. 11.
    Stott, D., Floering, B., Burke, D., Kalbarczpk, Z., Iyer, R.: NFTAPE: A framework for assessing dependability in distributed systems with lightweight fault injectors. In: Proceedings of the IEEE International Computer Performance and Dependability Symposium, IPDS 2000, pp. 91–100 (2000)Google Scholar
  12. 12.
    Segall, Z., Vrsalovic, D., Siewiorek, D., Yaskin, D., Kownacki, J., Barton, J., Dancey, R., Robinson, A., Lin, T.: FIAT-fault injection based automated testing environment. In: Eighteenth International Symposium on Fault-Tolerant Computing, FTCS-18, Digest of Papers, pp. 102–107 (June 1988)Google Scholar
  13. 13.
    Looker, N., Xu, J.: Dependability assessment of grid middleware. In: 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2007, pp. 125–130 (2007)Google Scholar
  14. 14.
    Ju, X., Soares, L., Shin, K., Ryu, K., Da Silva, D.: On fault resilience of openstack. In: Proceedings of the 4th Annual Symposium on Cloud Computing, New York, NY, USA, pp. 2:1–2:16 (2013)Google Scholar
  15. 15.
    Yahoo: AnarchyApe (accessed 2012).
  16. 16.
    Natella, R., Cotroneo, D., Duraes, J.A., Madeira, H.S.: On Fault Representativeness of Software Fault Injection. IEEE Transactions on Software Engineering 39(1), 80–96 (2013)CrossRefGoogle Scholar
  17. 17.
    Chillarege, R., Bhandari, I., Chaar, J., Halliday, M., Moebus, D., Ray, B., Wong, M.-Y.: Orthogonal defect classification-a concept for in-process measurements. IEEE Transactions on Software Engineering 18(11), 943–956 (1992)CrossRefGoogle Scholar
  18. 18.
    Pretschner, A., Holling, D., Eschbach, R., Gemmar, M.: A generic fault model for quality assurance. In: Moreira, A., Schätz, B., Gray, J., Vallecillo, A., Clarke, P. (eds.) MODELS 2013. LNCS, vol. 8107, pp. 87–103. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Pecchia, A., Cotroneo, D., Kalbarczyk, Z., Iyer, R.K.: Improving log-based field failure data analysis of multi-node computing systems. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks (DSN), pp. 97–108 (June 2011)Google Scholar
  20. 20.
    Grottke, M., Trivedi, K.: A classification of software faults. Journal of Reliability Engineering Association of Japan 27(7), 425–438 (2005)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2015

Authors and Affiliations

  1. 1.Hasso Plattner InstitutePotsdamGermany

Personalised recommendations