Abstract
Emerging hardware that trades reliability guarantees for resource savings presents a challenge to software engineered for deterministic execution. Research areas like approximate computing, however, embrace non-determinism by abandoning strict correctness in favor of maximizing the probability and degree of correctness. Existing work has used stochastic failure sampling to perform white-box searches along software execution paths, producing criticality assessments of which selected operations are likely most damaging if they fail. Here, we apply these assessments to a new domain and employ them using failure shaping, an automated method for reducing a computation’s expected output damage in a model where failures can be relocated but not eliminated. In two case studies, we demonstrate error reductions of 38% to 63% on Strassen’s matrix multiplication algorithm despite a virtually identical failure count. We discuss how our framework helps provide a smooth landscape for performing the search-based software engineering that will be required to apply this technology to larger problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Divide and Conquer — Set 5 (Strassen’s Matrix Multiplication). https://www.geeksforgeeks.org/strassens-matrix-multiplication/. Accessed 21 May 2018
Ackley, D.H.: Beyond efficiency. Commun. ACM 56(10), 38–40 (2013)
Akram, R., Alam, M.M.U., Muzahid, A.: Approximate lock: trading off accuracy for performance by skipping critical sections. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 253–263. IEEE (2016)
Arcuri, A., Iqbal, M.Z., Briand, L.: Black-box system testing of real-time embedded systems using random and search-based testing. In: Petrenko, A., Simão, A., Maldonado, J.C. (eds.) ICTSS 2010. LNCS, vol. 6435, pp. 95–110. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16573-3_8
Areias, C., Cunha, J.C., Vieira, M.: Studying the propagation of failures in SOAs. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 81–86. IEEE (2015)
Assaf, S., Upfal, E.: Fault tolerant sorting networks. SIAM J. Discret. Math. 4(4), 472–480 (1991)
Atkinson, B., DeBardeleben, N., Guan, Q., Robey, R., Jones, W.M.: Fault injection experiments with the CLAMR hydrodynamics mini-app. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 6–9. IEEE (2014)
Avižienis, A.: Fault-tolerance and fault-intolerance: complementary approaches to reliable computing. SIGPLAN Not. 10(6), 458–464 (1975). https://doi.org/10.1145/390016.808469
Baudry, B., Fleurey, F., Jézéquel, J.M., Traon, Y.L.: From genetic to bacteriological algorithms for mutation-based testing: research articles. Verif. Reliab. Softw. Test. 15(2), 73–96 (2005)
Borchert, C., Schirmeier, H., Spinczyk, O.: Protecting the dynamic dispatch in C++ by dependability aspects. In: GI-Jahrestagung, pp. 521–536 (2012)
Cámara, J., de Lemos, R.: Evaluation of resilience in self-adaptive systems using probabilistic model-checking. In: Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 53–62. IEEE Press (2012)
Campos, J., Ge, Y., Fraser, G., Eler, M., Arcuri, A.: An empirical evaluation of evolutionary algorithms for test suite generation. In: Menzies, T., Petke, J. (eds.) SSBSE 2017. LNCS, vol. 10452, pp. 33–48. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66299-2_3
Cappello, F., Geist, A., Gropp, B., Kalé, L.V., Kramer, B., Snir, M.: Toward exascale resilience. IJHPCA 23(4), 374–388 (2009). http://dblp.uni-trier.de/db/journals/ijhpca/ijhpca23.html#CappelloGGKKS09
Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107–115 (2012)
Chippa, V.K., Chakradhar, S.T., Roy, K., Raghunathan, A.: Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th Annual Design Automation Conference, p. 113. ACM (2013)
Dantas, J., Matos, R., Araujo, J., Oliveira, D., Oliveira, A., Maciel, P.: Hierarchical model and sensitivity analysis for a cloud-based VoD streaming service. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 10–16. IEEE (2016)
De Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: an architectural framework for software recovery of hardware faults. ACM SIGARCH Comput. Archit. News 38(3), 497–508 (2010)
Filiposka, S., Mishev, A., Juiz, C.: Current prospects towards energy-efficient top HPC systems. Comput. Sci. Inf. Syst. 13(1), 151–171 (2016)
Gargama, H., Chaturvedi, S.K.: Criticality assessment models for failure mode effects and criticality analysis using fuzzy logic. IEEE Trans. Reliab. 60(1), 102–110 (2011)
Gay, G., Rayadurgam, S., Heimdahl, M.P.: Automated steering of model-based test oracles to admit real program behaviors. IEEE Trans. Softw. Eng. 43(6), 531–555 (2017)
Guo, S., Huang, H.Z., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60(1), 263–274 (2011)
Han, J., Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: 2013 18th IEEE European Test Symposium (ETS), pp. 1–6. IEEE (2013)
Harman, M., et al.: Testability transformation. IEEE Trans. Softw. Eng. 30(1), 3–16 (2004)
Holler, A., Macher, G., Rauter, T., Iber, J., Kreiner, C.: A virtual fault injection framework for reliability-aware software development. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 69–74. IEEE (2015)
Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72(12), 4662–4695 (2016)
Ibtesham, D., DeBonis, D., Arnold, D., Ferreira, K.B.: Coarse-grained energy modeling of rollback/recovery mechanisms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 708–713. IEEE (2014)
Irrera, I., Vieira, M.: Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 75–80. IEEE (2015)
Jones, T.B., Ackley, D.H.: Comparison criticality in sorting algorithms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 726–731. IEEE (2014)
Jones, T.B., Ackley, D.H.: Scalable robustness. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, pp. 31–38. IEEE (2016)
Kahng, A.B., Kang, S., Kumar, R., Sartori, J.: Slack redistribution for graceful degradation under voltage overscaling. In: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, pp. 825–831. IEEE Press (2010)
Kim, E.P., Shanbhag, N.R.: Soft N-modular redundancy. IEEE Trans. Comput. 61(3), 323–336 (2012)
Kukunas, J., Cupper, R.D., Kapfhammer, G.M.: A genetic algorithm to improve Linux kernel performance on resource-constrained devices. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2095–2096. ACM (2010)
Larsen, K.G., Mikucionis, M., Nielsen, B.: Online testing of real-time systems using Uppaal. In: Grabowski, J., Nielsen, B. (eds.) FATES 2004. LNCS, vol. 3395, pp. 79–94. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31848-4_6
Liu, S., Pattabiraman, K., Moscibroda, T., Zorn, B.G.: Flikker: saving dram refresh-power through critical data partitioning. ACM SIGPLAN Not. 47(4), 213–224 (2012)
Mathew, S., Varia, J.: Overview of Amazon Web Services. Amazon Whitepapers (2014)
Mohapatra, D., Chippa, V.K., Raghunathan, A., Roy, K.: Design of voltage-scalable meta-functions for approximate computing. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6. IEEE (2011)
Monson, J.S., Wirthlin, M., Hutchings, B.: A fault injection analysis of Linux operating on an FPGA-embedded platform. Int. J. Reconfig. Comput. 2012, 7 (2012)
Natella, R., Cotroneo, D., Duraes, J.A., Madeira, H.S.: On fault representativeness of software fault injection. IEEE Trans. Softw. Eng. 39(1), 80–96 (2013)
Oliveira, D.A., Lunardi, C.B., Pilla, L.L., Rech, P., Navaux, P.O., Carro, L.: Radiation sensitivity of high performance computing applications on Kepler-based GPGPUs. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 732–737. IEEE (2014)
Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10) (2007)
Piancó, M., Fonseca, B., Antunes, N.: Code change history and software vulnerabilities. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 6–9. IEEE (2016)
Raha, A., Raghunathan, V.: Towards full-system energy-accuracy tradeoffs: a case study of an approximate smart camera system. In: Proceedings of the 54th Annual Design Automation Conference 2017, p. 74. ACM (2017)
Rodrigues, I., Ribeiro, M., Medeiros, F., Borba, P., Fonseca, B., Gheyi, R.: Assessing fine-grained feature dependencies. Inf. Softw. Technol. 78, 27–52 (2016)
Rudolph, L.: A robust sorting network. IEEE Trans. Comput. 100(4), 326–335 (1985)
Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: EnerJ: approximate data types for safe and general low-power computation. ACM SIGPLAN Not. 46, 164–174 (2011)
Siciliano, V., Garzilli, I., Fracassi, C., Criscuolo, S., Ventre, S., Di Bernardo, D.: MiRNAs confer phenotypic robustness to gene networks by suppressing biological noise. Nat. Commun. 4, 2364 (2013)
Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of highway transportation networks. J. Transp. Secur. 2(1–2), 29–46 (2009)
Vazirani, V.V.: Approximation Algorithms. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-662-04565-7
Xiang, J., Ye, L., Vicario, E., Tadano, K., Machida, F.: Analysis of relevance and importance of components in system reliability. In: 2015 2nd International Symposium on Dependable Computing and Internet of Things (DCIT), pp. 146–147. IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jones, T.B., Ackley, D.H. (2018). Damage Reduction via White-Box Failure Shaping. In: Colanzi, T., McMinn, P. (eds) Search-Based Software Engineering. SSBSE 2018. Lecture Notes in Computer Science(), vol 11036. Springer, Cham. https://doi.org/10.1007/978-3-319-99241-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-99241-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99240-2
Online ISBN: 978-3-319-99241-9
eBook Packages: Computer ScienceComputer Science (R0)