Advertisement

How verified (or tested) is my code? Falsification-driven verification and testing

  • Alex Groce
  • Iftekhar Ahmed
  • Carlos Jensen
  • Paul E. McKenney
  • Josie Holmes
Article
  • 103 Downloads

Abstract

Formal verification has advanced to the point that developers can verify the correctness of small, critical modules. Unfortunately, despite considerable efforts, determining if a “verification” verifies what the author intends is still difficult. Previous approaches are difficult to understand and often limited in applicability. Developers need verification coverage in terms of the software they are verifying, not model checking diagnostics. We propose a methodology to allow developers to determine (and correct) what it is that they have verified, and tools to support that methodology. Our basic approach is based on a novel variation of mutation analysis and the idea of verification driven by falsification. We use the CBMC model checker to show that this approach is applicable not only to simple data structures and sorting routines, and verification of a routine in Mozilla’s JavaScript engine, but to understanding an ongoing effort to verify the Linux kernel read-copy-update mechanism. Moreover, we show that despite the probabilistic nature of random testing and the tendency to incompleteness of testing as opposed to verification, the same techniques, with suitable modifications, apply to automated test generation as well as to formal verification. In essence, it is the number of surviving mutants that drives the scalability of our methods, not the underlying method for detecting faults in a program. From the point of view of a Popperian analysis where an unkilled mutant is a weakness (in terms of its falsifiability) in a “scientific theory” of program behavior, it is only the number of weaknesses to be examined by a user that is important.

Keywords

Formal verification Random testing Mutation testing Philosophy of science Falsification Oracles 

Notes

Acknowledgements

A portion of this work was funded by NSF Grants CCF-1217824 and CCF-1054786.

References

  1. Ahmed, I., Gopinath, R., Brindescu, C., Groce, A., Jensen, C.: Can testedness be effectively measured? In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pp. 547–558. ACM, New York, NY, USA (2016).  https://doi.org/10.1145/2950290.2950324
  2. Ahmed, I., Jensen, C., Groce, A., McKenney, P.E.: Applying mutation analysis on kernel test suites: an experience report. In: International Workshop on Mutation Analysis, pp. 110–115 (2017)Google Scholar
  3. Aichernig, B.K.: Model-based mutation testing of reactive systems. In: Theories of Programming and Formal Methods, pp. 23–36. Springer (2013)Google Scholar
  4. Alipour, M.A., Groce, A., Zhang, C., Sanadaji, A., Caushik, G.: Finding model-checkable needles in large source code haystacks: Modular bug-finding via static analysis and dynamic invariant discovery. In: International Workshop on Constraints in Formal Verification (2013)Google Scholar
  5. Andrews, J.H., Briand, L.C., Labiche, Y.: Is mutation an appropriate tool for testing experiments? In: International Conference on Software Engineering, pp. 402–411 (2005)Google Scholar
  6. Andrews, J.H., Groce, A., Weston, M., Xu, R.G.: Random test run length and effectiveness. In: Automated Software Engineering, pp. 19–28 (2008)Google Scholar
  7. Andrews, J.H., Briand, L.C., Labiche, Y., Namin, A.S.: Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng. 32(8), 608 (2006)CrossRefGoogle Scholar
  8. Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)CrossRefGoogle Scholar
  9. Auerbach, G., Copty, F., Paruthi, V.: Formal verification of arbiters using property strengthening and underapproximations. In: Formal Methods in Computer-Aided Design, pp. 21–24 (2010)Google Scholar
  10. Ball, T., Kupferman, O., Yorsh, G.: Abstraction for falsification. In: Computer Aided Verification, pp. 67–81 (2005)Google Scholar
  11. Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)CrossRefGoogle Scholar
  12. Bentley, J.: Programming pearls: writing correct programs. Commun. ACM 26(12), 1040–1045 (1983)CrossRefGoogle Scholar
  13. Black, P.E., Okun, V., Yesha, Y.: Mutation of model checker specifications for test generation and evaluation. Mutation 2000, 14–20 (2000)Google Scholar
  14. Bloch, J.: Extra, extra - read all about it: nearly all binary searches and mergesorts are broken. http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html (2006)
  15. bremen, mrbean, jmcgeheeiv, et al.: pyfakefs implements a fake file system that mocks the Python file system modules. https://github.com/jmcgeheeiv/pyfakefs (2011)
  16. Budd, T.A., DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In: Principles of Programming Languages, pp. 220–233. ACM (1980)Google Scholar
  17. Budd, T.A.: Mutation analysis of program test data. Ph.D. thesis, Yale University, New Haven, CT, USA (1980)Google Scholar
  18. Budd, T.A., Lipton, R.J., DeMillo, R.A., Sayward, F.G.: Mutation Analysis. Yale University, Department of Computer Science, New Haven (1979)CrossRefGoogle Scholar
  19. Buxton, J.N., Randell, B.: Report of a conference sponsored by the NATO science committee. In: NATO Software Engineering Conference, vol. 1969 (1969)Google Scholar
  20. Chen, Y., Groce, A., Zhang, C., Wong, W.K., Fern, X., Eide, E., Regehr, J.: Taming compiler fuzzers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 197–208 (2013)Google Scholar
  21. Chockler, H., Gurfinkel, A., Strichman, O.: Beyond vacuity: Towards the strongest passing formula. In: Proceedings of the 2008 International Conference on Formal Methods in Computer-Aided Design, pp. 24:1–24:8 (2008)Google Scholar
  22. Chockler, H., Kupferman, O., Kurshan, R.P., Vardi, M.Y.: A practical approach to coverage in model checking. In: Computer Aided Verification, pp. 66–78 (2001)Google Scholar
  23. Chockler, H., Kroening, D., Purandare, M.: Computing mutation coverage in interpolation-based model checking. IEEE Trans. CAD Integr. Circuits Syst. 31(5), 765–778 (2012)CrossRefGoogle Scholar
  24. Clarke, E., Grumberg, O., McMillan, K., Zhao, X.: Efficient generation of counterexamples and witnesses in symbolic model checking. In: Design Automation Conference, pp. 427–432 (1995)Google Scholar
  25. Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (2000)Google Scholar
  26. Cuoq, P., Monate, B., Pacalet, A., Prevosto, V., Regehr, J., Yakobowski, B., Yang, X.: Testing static analyzers with randomly generated programs. In: NASA Formal Methods Symposium, pp. 120–125 (2012)Google Scholar
  27. Daran, M., Thévenod-Fosse, P.: Software error analysis: A real case study involving real faults and mutations. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 158–171. ACM (1996)Google Scholar
  28. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340 (2008)Google Scholar
  29. DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Hints on test data selection: help for the practicing programmer. Computer 4(11), 34 (1978)CrossRefGoogle Scholar
  30. Desnoyers, M.: [RFC git tree] userspace RCU (urcu) for Linux (2009). http://urcu.so
  31. Desnoyers, M., McKenney, P.E., Stern, A., Dagenais, M.R., Walpole, J.: User-level implementations of read-copy update. IEEE Trans. Parallel Distrib. Syst. 23, 375–382 (2012).  https://doi.org/10.1109/TPDS.2011.159 CrossRefGoogle Scholar
  32. Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall, Englewood Cliffs, NJ (1976)zbMATHGoogle Scholar
  33. Een, N., Sorensson, N.: An extensible SAT-solver. In: Symposium on the Theory and Applications of Satisfiability Testing (SAT), pp. 502–518 (2003)Google Scholar
  34. Ghassabani, E., Gacek, A., Whalen, M.W., Heimdahl, M.P.E., Wagner, L.G.: Proof-based coverage metrics for formal verification. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 194–199 (2017a)Google Scholar
  35. Ghassabani, E., Gacek, A., Whalen, M.W.: Efficient generation of inductive validity cores for safety properties. In: ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 314–325 (2016)Google Scholar
  36. Ghassabani, E., Whalen, M.W., Gacek, A.: Efficient generation of all minimal inductive validity cores. In: FMCAD, pp. 31–38 (2017b)Google Scholar
  37. Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Comparing non-adequate test suites using coverage criteria. In: International Symposium on Software Testing and Analysis, pp. 302–313 (2013)Google Scholar
  38. Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Guidelines for coverage-based comparisons of non-adequate test suites. ACM Trans. Softw. Eng. Methodol.(accepted for publication)Google Scholar
  39. Gopinath, R., Jensen, C., Groce, A.: Code coverage for suite evaluation by developers. In: International Conference on Software Engineering, pp. 72–82 (2014)Google Scholar
  40. Groce, A., Ahmed, I., Jensen, C., McKenney, P.E.: How verified is my code? falsification-driven verification. In: 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9–13, 2015, pp. 737–748 (2015).  https://doi.org/10.1109/ASE.2015.40
  41. Groce, A., Alipour, M.A., Gopinath, R.: Coverage and its discontents. In: Onward! Essays, pp. 255–268 (2014)Google Scholar
  42. Groce, A., Erwig, M.: Finding common ground: choose, assert, and assume. In: Workshop on Dynamic Analysis, pp. 12–17 (2012)Google Scholar
  43. Groce, A., Holmes, J., Marinov, D., Shi, A., Zhang, L.: An extensible, regular-expression-based tool for multi-language mutant generation. In: International Conference on Software Engineering (2018)Google Scholar
  44. Groce, A., Holzmann, G., Joshi, R., Xu, R.G.: Putting flight software through the paces with testing, model checking, and constraint-solving. In: Workshop on Constraints in Formal Verification, pp. 1–15 (2008)Google Scholar
  45. Groce, A., Holzmann, G., Joshi, R.: Randomized differential testing as a prelude to formal verification. In: International Conference on Software Engineering, pp. 621–631 (2007)Google Scholar
  46. Groce, A., Joshi, R.: Exploiting traces in program analysis. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 379–393 (2006)Google Scholar
  47. Groce, A., Joshi, R.: Random testing and model checking: Building a common framework for nondeterministic exploration. In: Workshop on Dynamic Analysis, pp. 22–28 (2008)Google Scholar
  48. Groce, A., Pinto, J., Azimi, P., Mittal, P., Holmes, J., Kellar, K.: TSTL: the template scripting testing language. https://github.com/agroce/tstl (2015b)
  49. Groce, A., Pinto, J., Azimi, P., Mittal, P.: TSTL: a language and tool for testing (demo). In: ACM International Symposium on Software Testing and Analysis, pp. 414–417 (2015a)Google Scholar
  50. Groce, A., Pinto, J.: A little language for testing. In: NASA Formal Methods Symposium, pp. 204–218 (2015)Google Scholar
  51. Groce, A.: Error explanation with distance metrics. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 108–122 (2004)Google Scholar
  52. Groce, A.: Quickly testing the tester via path coverage. In: Workshop on Dynamic Analysis (2009)Google Scholar
  53. Groce, A., Kroening, D.: Making the most of BMC counter examples. Electron. Notes Theor. Comput. Sci. 119(2), 67–81 (2005)CrossRefzbMATHGoogle Scholar
  54. Guniguntala, D., McKenney, P.E., Triplett, J., Walpole, J.: The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux. IBM Syst. J. 47(2), 221–236 (2008)CrossRefGoogle Scholar
  55. Holmes, J., Groce, A., Pinto, J., Mittal, P., Azimi, P., Kellar, K., O’Brien, J.: TSTL: the template scripting testing language. Int. J. Softw. Tools Technol. Transf. (2016).  https://doi.org/10.1007/s10009-016-0445-y. (Online first)
  56. Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)CrossRefGoogle Scholar
  57. Hoskote, Y., Kam, T., Ho, P.H., Zhao, X.: Coverage estimation for symbolic model checking. In: ACM/IEEE Design Automation Conference, pp. 300–305 (1999)Google Scholar
  58. Hume, D.: An Enquiry Concerning Human Understanding. Routledge, London (1748)CrossRefGoogle Scholar
  59. Just, R., Jalali, D., Inozemtseva, L., Ernst, M.D., Holmes, R., Fraser, G.: Are mutants a valid substitute for real faults in software testing? In: ACM SIGSOFT Symposium on Foundations of Software Engineering, pp. 654–665 (2014)Google Scholar
  60. Kaner, C., Bach, J., Pettichord, B.: Lessons Learned in Software Testing: A Context-Driven Approach. Wiley, Hoboken (2001)Google Scholar
  61. Kroening, D., Clarke, E.M., Lerda, F.: A tool for checking ANSI-C programs. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 168–176 (2004)Google Scholar
  62. Kroening, D., Strichman, O.: Efficient computation of recurrence diameters. In: Verification, Model Checking, and Abstract Interpretation, pp. 298–309 (2003)Google Scholar
  63. Kupferman, O., Li, W., Seshia, S.: A theory of mutations with applications to vacuity, coverage, and fault tolerance. In: Formal Methods in Computer-Aided Design, pp. 1–9 (2008)Google Scholar
  64. Lakatos, I.: The role of crucial experiments in science. Stud. His. Philos. Sci. Part A 4(4), 309–325 (1974)CrossRefGoogle Scholar
  65. Lawlor, R.: quicksort.c. http://www.comp.dit.ie/rlawlor/Alg_DS/sorting/quickSort.c. Referenced April 20 (2015)
  66. Lee, T.C., Hsiung, P.A.: Mutation coverage estimation for model checking. In: Automated Technology for Verification and Analysis, pp. 354–368 (2004)Google Scholar
  67. Lipton, R.J.: Fault diagnosis of computer programs. Carnegie Mellon Univ, Technical Report (1971)Google Scholar
  68. Liu, X.: muupi mutation tool. https://github.com/apepkuss/muupi (2016)
  69. Mathur, A.P.: Foundations of Software Testing. Addison-Wesley, Boston (2012)Google Scholar
  70. Mathur, A.P., Wong, W.E.: An empirical comparison of data flow and mutation-based test adequacy criteria. J. Softw. Test. Verif. Reliab. 4(1), 9–31 (1994)CrossRefGoogle Scholar
  71. McKeeman, W.: Differential testing for software. Dig. Tech. J. Dig. Equip. Corp. 10(1), 100–107 (1998)Google Scholar
  72. McKenney, P.E., Eggemann, D., Randhawa, R.: Improving energy efficiency on asymmetric multiprocessing systems (2013). https://www.usenix.org/system/files/hotpar13-poster8-mckenney.pdf
  73. McKenney, P.E., Slingwine, J.D.: Read-copy update: Using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)Google Scholar
  74. McKenney, P.E.: RCU Linux usage (2006). Available: http://www.rdrop.com/users/paulmck/RCU/linuxusage.html (Viewed January 14, 2007)
  75. McKenney, P.E.: RCU torture test operation. https://www.kernel.org/doc/Documentation/RCU/torture.txt
  76. McKenney, P.E.: Re: [PATCH fyi] RCU: the bloatwatch edition (2009). Available: http://lkml.org/lkml/2009/1/14/449 (Viewed January 15, 2009)
  77. McKenney, P.E.: Verification challenge 4: Tiny RCU. http://paulmck.livejournal.com/39343.html (2015a)
  78. McKenney, P.E.: Verification challenge 5: Uses of RCU. http://paulmck.livejournal.com/39793.html (2015b)
  79. McKenney, P.E.: Structured deferral: synchronization via procrastination. Commun. ACM 56(7), 40–49 (2013).  https://doi.org/10.1145/2483852.2483867 CrossRefGoogle Scholar
  80. Murugesan, A., Whalen, M.W., Rungta, N., Tkachuk, O., Person, S., Heimdahl, M.P.E., You, D.: Are we there yet? determining the adequacy of formalized requirements and test suites. In: NASA Formal Methods Symposium, pp. 279–294 (2015)Google Scholar
  81. Offutt, A.J., Voas, J.M.: Subsumption of condition coverage techniques by mutation testing. In: Technical Report ISSE-TR-96-01, Information and Software Systems Engineering, George Mason University (1996)Google Scholar
  82. Papadakis, M., Jia, Y., Harman, M., Traon, Y.L.: Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In: International Conference on Software Engineering (2015)Google Scholar
  83. Popper, K.: The Logic of Scientific Discovery. Routledge, Hutchinson (1959)zbMATHGoogle Scholar
  84. Popper, K.: Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge, London (1963)Google Scholar
  85. Schuler, D., Zeller, A.: Assessing oracle quality with checked coverage. In: International Conference on Software Testing, Verification and Validation, pp. 90–99 (2011)Google Scholar
  86. Schuler, D., Zeller, A.: Checked coverage: an indicator for oracle quality. Softw. Test. Verif. Reliab. 23(7), 531–551 (2013)CrossRefGoogle Scholar
  87. scvalex: Finding all paths of minimum length to a node using dijkstras algorithm. https://compprog.wordpress.com/2008/01/17/finding-all-paths-of-minimum-length-to-a-node-using-dijkstras-algorithm/ (2008)
  88. Stout, R.: If Death Ever Slept. Viking (1957)Google Scholar
  89. Strichman, O., Godlin, B.: Regression verification-a practical way to verify programs. Verified Software: Theories, Tools, Experiments pp. 496–501 (2008)Google Scholar
  90. Tassarotti, J., Dreyer, D., Vafeiadis, V.: Verifying read-copy-update in a logic for weak memory. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2015). (To appear)Google Scholar
  91. Tip, F.: A survey of program slicing techniques. J. Program. Lang. 3, 121–189 (1995)Google Scholar
  92. Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)CrossRefGoogle Scholar
  93. Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 283–294 (2011)Google Scholar
  94. Zhang, X., Gupta, R., Zhang, Y.: Precise dynamic slicing algorithms. In: International Conference on Software Engineering, pp. 319–329 (2003)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Alex Groce
    • 1
  • Iftekhar Ahmed
    • 2
  • Carlos Jensen
    • 2
  • Paul E. McKenney
    • 3
  • Josie Holmes
    • 1
  1. 1.School of Informatics, Computing and Cyber SystemsNorthern Arizona UniversityFlagstaffUSA
  2. 2.School of Electrical Engineering and Computer ScienceOregon State UniversityCorvallisUSA
  3. 3.IBM Linux Technology CenterBeavertonUSA

Personalised recommendations