Skip to main content

Damage Reduction via White-Box Failure Shaping

  • Conference paper
  • First Online:
Search-Based Software Engineering (SSBSE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11036))

Included in the following conference series:

  • 1139 Accesses

Abstract

Emerging hardware that trades reliability guarantees for resource savings presents a challenge to software engineered for deterministic execution. Research areas like approximate computing, however, embrace non-determinism by abandoning strict correctness in favor of maximizing the probability and degree of correctness. Existing work has used stochastic failure sampling to perform white-box searches along software execution paths, producing criticality assessments of which selected operations are likely most damaging if they fail. Here, we apply these assessments to a new domain and employ them using failure shaping, an automated method for reducing a computation’s expected output damage in a model where failures can be relocated but not eliminated. In two case studies, we demonstrate error reductions of 38% to 63% on Strassen’s matrix multiplication algorithm despite a virtually identical failure count. We discuss how our framework helps provide a smooth landscape for performing the search-based software engineering that will be required to apply this technology to larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Divide and Conquer — Set 5 (Strassen’s Matrix Multiplication). https://www.geeksforgeeks.org/strassens-matrix-multiplication/. Accessed 21 May 2018

  2. Ackley, D.H.: Beyond efficiency. Commun. ACM 56(10), 38–40 (2013)

    Article  Google Scholar 

  3. Akram, R., Alam, M.M.U., Muzahid, A.: Approximate lock: trading off accuracy for performance by skipping critical sections. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 253–263. IEEE (2016)

    Google Scholar 

  4. Arcuri, A., Iqbal, M.Z., Briand, L.: Black-box system testing of real-time embedded systems using random and search-based testing. In: Petrenko, A., Simão, A., Maldonado, J.C. (eds.) ICTSS 2010. LNCS, vol. 6435, pp. 95–110. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16573-3_8

    Chapter  Google Scholar 

  5. Areias, C., Cunha, J.C., Vieira, M.: Studying the propagation of failures in SOAs. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 81–86. IEEE (2015)

    Google Scholar 

  6. Assaf, S., Upfal, E.: Fault tolerant sorting networks. SIAM J. Discret. Math. 4(4), 472–480 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  7. Atkinson, B., DeBardeleben, N., Guan, Q., Robey, R., Jones, W.M.: Fault injection experiments with the CLAMR hydrodynamics mini-app. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 6–9. IEEE (2014)

    Google Scholar 

  8. Avižienis, A.: Fault-tolerance and fault-intolerance: complementary approaches to reliable computing. SIGPLAN Not. 10(6), 458–464 (1975). https://doi.org/10.1145/390016.808469

    Article  Google Scholar 

  9. Baudry, B., Fleurey, F., Jézéquel, J.M., Traon, Y.L.: From genetic to bacteriological algorithms for mutation-based testing: research articles. Verif. Reliab. Softw. Test. 15(2), 73–96 (2005)

    Article  Google Scholar 

  10. Borchert, C., Schirmeier, H., Spinczyk, O.: Protecting the dynamic dispatch in C++ by dependability aspects. In: GI-Jahrestagung, pp. 521–536 (2012)

    Google Scholar 

  11. Cámara, J., de Lemos, R.: Evaluation of resilience in self-adaptive systems using probabilistic model-checking. In: Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 53–62. IEEE Press (2012)

    Google Scholar 

  12. Campos, J., Ge, Y., Fraser, G., Eler, M., Arcuri, A.: An empirical evaluation of evolutionary algorithms for test suite generation. In: Menzies, T., Petke, J. (eds.) SSBSE 2017. LNCS, vol. 10452, pp. 33–48. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66299-2_3

    Chapter  Google Scholar 

  13. Cappello, F., Geist, A., Gropp, B., Kalé, L.V., Kramer, B., Snir, M.: Toward exascale resilience. IJHPCA 23(4), 374–388 (2009). http://dblp.uni-trier.de/db/journals/ijhpca/ijhpca23.html#CappelloGGKKS09

  14. Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107–115 (2012)

    Article  MATH  Google Scholar 

  15. Chippa, V.K., Chakradhar, S.T., Roy, K., Raghunathan, A.: Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th Annual Design Automation Conference, p. 113. ACM (2013)

    Google Scholar 

  16. Dantas, J., Matos, R., Araujo, J., Oliveira, D., Oliveira, A., Maciel, P.: Hierarchical model and sensitivity analysis for a cloud-based VoD streaming service. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 10–16. IEEE (2016)

    Google Scholar 

  17. De Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: an architectural framework for software recovery of hardware faults. ACM SIGARCH Comput. Archit. News 38(3), 497–508 (2010)

    Article  Google Scholar 

  18. Filiposka, S., Mishev, A., Juiz, C.: Current prospects towards energy-efficient top HPC systems. Comput. Sci. Inf. Syst. 13(1), 151–171 (2016)

    Article  Google Scholar 

  19. Gargama, H., Chaturvedi, S.K.: Criticality assessment models for failure mode effects and criticality analysis using fuzzy logic. IEEE Trans. Reliab. 60(1), 102–110 (2011)

    Article  Google Scholar 

  20. Gay, G., Rayadurgam, S., Heimdahl, M.P.: Automated steering of model-based test oracles to admit real program behaviors. IEEE Trans. Softw. Eng. 43(6), 531–555 (2017)

    Article  Google Scholar 

  21. Guo, S., Huang, H.Z., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60(1), 263–274 (2011)

    Article  Google Scholar 

  22. Han, J., Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: 2013 18th IEEE European Test Symposium (ETS), pp. 1–6. IEEE (2013)

    Google Scholar 

  23. Harman, M., et al.: Testability transformation. IEEE Trans. Softw. Eng. 30(1), 3–16 (2004)

    Article  Google Scholar 

  24. Holler, A., Macher, G., Rauter, T., Iber, J., Kreiner, C.: A virtual fault injection framework for reliability-aware software development. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 69–74. IEEE (2015)

    Google Scholar 

  25. Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72(12), 4662–4695 (2016)

    Article  Google Scholar 

  26. Ibtesham, D., DeBonis, D., Arnold, D., Ferreira, K.B.: Coarse-grained energy modeling of rollback/recovery mechanisms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 708–713. IEEE (2014)

    Google Scholar 

  27. Irrera, I., Vieira, M.: Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 75–80. IEEE (2015)

    Google Scholar 

  28. Jones, T.B., Ackley, D.H.: Comparison criticality in sorting algorithms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 726–731. IEEE (2014)

    Google Scholar 

  29. Jones, T.B., Ackley, D.H.: Scalable robustness. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, pp. 31–38. IEEE (2016)

    Google Scholar 

  30. Kahng, A.B., Kang, S., Kumar, R., Sartori, J.: Slack redistribution for graceful degradation under voltage overscaling. In: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, pp. 825–831. IEEE Press (2010)

    Google Scholar 

  31. Kim, E.P., Shanbhag, N.R.: Soft N-modular redundancy. IEEE Trans. Comput. 61(3), 323–336 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  32. Kukunas, J., Cupper, R.D., Kapfhammer, G.M.: A genetic algorithm to improve Linux kernel performance on resource-constrained devices. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2095–2096. ACM (2010)

    Google Scholar 

  33. Larsen, K.G., Mikucionis, M., Nielsen, B.: Online testing of real-time systems using Uppaal. In: Grabowski, J., Nielsen, B. (eds.) FATES 2004. LNCS, vol. 3395, pp. 79–94. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31848-4_6

    Chapter  MATH  Google Scholar 

  34. Liu, S., Pattabiraman, K., Moscibroda, T., Zorn, B.G.: Flikker: saving dram refresh-power through critical data partitioning. ACM SIGPLAN Not. 47(4), 213–224 (2012)

    Article  Google Scholar 

  35. Mathew, S., Varia, J.: Overview of Amazon Web Services. Amazon Whitepapers (2014)

    Google Scholar 

  36. Mohapatra, D., Chippa, V.K., Raghunathan, A., Roy, K.: Design of voltage-scalable meta-functions for approximate computing. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6. IEEE (2011)

    Google Scholar 

  37. Monson, J.S., Wirthlin, M., Hutchings, B.: A fault injection analysis of Linux operating on an FPGA-embedded platform. Int. J. Reconfig. Comput. 2012, 7 (2012)

    Article  Google Scholar 

  38. Natella, R., Cotroneo, D., Duraes, J.A., Madeira, H.S.: On fault representativeness of software fault injection. IEEE Trans. Softw. Eng. 39(1), 80–96 (2013)

    Article  Google Scholar 

  39. Oliveira, D.A., Lunardi, C.B., Pilla, L.L., Rech, P., Navaux, P.O., Carro, L.: Radiation sensitivity of high performance computing applications on Kepler-based GPGPUs. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 732–737. IEEE (2014)

    Google Scholar 

  40. Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10) (2007)

    Google Scholar 

  41. Piancó, M., Fonseca, B., Antunes, N.: Code change history and software vulnerabilities. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 6–9. IEEE (2016)

    Google Scholar 

  42. Raha, A., Raghunathan, V.: Towards full-system energy-accuracy tradeoffs: a case study of an approximate smart camera system. In: Proceedings of the 54th Annual Design Automation Conference 2017, p. 74. ACM (2017)

    Google Scholar 

  43. Rodrigues, I., Ribeiro, M., Medeiros, F., Borba, P., Fonseca, B., Gheyi, R.: Assessing fine-grained feature dependencies. Inf. Softw. Technol. 78, 27–52 (2016)

    Article  Google Scholar 

  44. Rudolph, L.: A robust sorting network. IEEE Trans. Comput. 100(4), 326–335 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  45. Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: EnerJ: approximate data types for safe and general low-power computation. ACM SIGPLAN Not. 46, 164–174 (2011)

    Article  Google Scholar 

  46. Siciliano, V., Garzilli, I., Fracassi, C., Criscuolo, S., Ventre, S., Di Bernardo, D.: MiRNAs confer phenotypic robustness to gene networks by suppressing biological noise. Nat. Commun. 4, 2364 (2013)

    Article  Google Scholar 

  47. Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of highway transportation networks. J. Transp. Secur. 2(1–2), 29–46 (2009)

    Article  Google Scholar 

  48. Vazirani, V.V.: Approximation Algorithms. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-662-04565-7

    Book  MATH  Google Scholar 

  49. Xiang, J., Ye, L., Vicario, E., Tadano, K., Machida, F.: Analysis of relevance and importance of components in system reliability. In: 2015 2nd International Symposium on Dependable Computing and Internet of Things (DCIT), pp. 146–147. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas B. Jones .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jones, T.B., Ackley, D.H. (2018). Damage Reduction via White-Box Failure Shaping. In: Colanzi, T., McMinn, P. (eds) Search-Based Software Engineering. SSBSE 2018. Lecture Notes in Computer Science(), vol 11036. Springer, Cham. https://doi.org/10.1007/978-3-319-99241-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99241-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99240-2

  • Online ISBN: 978-3-319-99241-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics