Bucketing Failing Tests via Symbolic Analysis

  • Van-Thuan PhamEmail author
  • Sakaar Khurana
  • Subhajit Roy
  • Abhik Roychoudhury
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10202)


A common problem encountered while debugging programs is the overwhelming number of test cases generated by automated test generation tools, where many of the tests are likely to fail due to same bug. Some coarse-grained clustering techniques based on point of failure (PFB) and stack hash (CSB) have been proposed to address the problem. In this work, we propose a new symbolic analysis-based clustering algorithm that uses the semantic reason behind failures to group failing tests into more “meaningful” clusters. We implement our algorithm within the KLEE symbolic execution engine; our experiments on 21 programs drawn from multiple benchmark-suites show that our technique is effective at producing more fine grained clusters as compared to the FSB and CSB clustering schemes. As a side-effect, our technique also provides a semantic characterization of the fault represented by each cluster—a precious hint to guide debugging. A user study conducted among senior undergraduates and masters students further confirms the utility of our test clustering method.


Path Condition Symbolic Execution Subject Program Symbolic Analysis Semantic Characterization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is supported in part by the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, Award No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate.


  1. 1.
  2. 2.
    Exploit-db benchmarks.
  3. 3.
    Avgerinos, T., Rebert, A., Cha, S.K., Brumley, D.: Enhancing symbolic execution with veritesting. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 1083–1094. ACM, New York (2014)Google Scholar
  4. 4.
    Cadar, C., Dunbar, D., Engler, D.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 209–224. USENIX Association, Berkeley (2008)Google Scholar
  5. 5.
    Chipounov, V., Kuznetsov, V., Candea, G.: S2E: A platform for in-vivo multi-path analysis of software systems. SIGPLAN Not. 47(4), 265–278 (2011)CrossRefGoogle Scholar
  6. 6.
    Dang, Y., Wu, R., Zhang, H., Zhang, D., Nobel, P.: Rebucket: A method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th International Conference on Software Engineering, ICSE 2012, pp. 1084–1093. IEEE Press, Piscataway (2012)Google Scholar
  7. 7.
    Do, H., Elbaum, S., Rothermel, G.: Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Softw. Engg. 10(4), 405–435 (2005)CrossRefGoogle Scholar
  8. 8.
    Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., Hunt, G.: Debugging in the (very) large: Ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP 2009, pp. 103–116. ACM, New York (2009)Google Scholar
  9. 9.
    Godefroid, P., Levin, M.Y., Molnar, D.: SAGE: Whitebox fuzzing for security testing. Commun. ACM 55(3), 40–44 (2012)CrossRefGoogle Scholar
  10. 10.
    Goues, C.L., Holtschulte, N., Smith, E.K., Brun, Y., Devanbu, P., Forrest, S., Weimer, W.: The manybugs and introclass benchmarks for automated repair of C programs. IEEE Trans. Softw. Eng. 41(12), 1236–1256 (2015)CrossRefGoogle Scholar
  11. 11.
    Jin, W., Orso, A.: F3: Fault localization for field failures. In: Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, pp. 213–223. ACM, New York (2013)Google Scholar
  12. 12.
    Kim, S., Zimmermann, T., Nagappan, N.: Crash graphs: An aggregated view of multiple crashes to improve crash triage. In: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks, DSN 2011, pp. 486–493. IEEE Computer Society, Washington, DC (2011)Google Scholar
  13. 13.
    Kuznetsov, V., Kinder, J., Bucur, S., Candea, G.: Efficient state merging in symbolic execution. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 193–204. ACM, New York (2012)Google Scholar
  14. 14.
    Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 75, . IEEE Computer Society, Washington, DC (2004)Google Scholar
  15. 15.
    Liu, C., Han, J.: Failure proximity: A fault localization-based approach. In: Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT 2006/FSE-14, pp. 46–56. ACM, New York (2006)Google Scholar
  16. 16.
    Lu, S., Li, Z., Qin, F., Tan, L., Zhou, P., Zhou, Y.: Bugbench: Benchmarks for evaluating bug detection tools. In: Workshop on the Evaluation of Software Defect Detection Tools (2005)Google Scholar
  17. 17.
    Modani, N., Gupta, R., Lohman, G., Syeda-Mahmood, T., Mignet, L.: Automatically identifying known software problems. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW 2007, pp. 433–441. IEEE Computer Society, Washington, DC (2007)Google Scholar
  18. 18.
    Molnar, D., Li, X.C., Wagner, D.A.: Dynamic test generation to find integer bugs in x86 binary linux programs. In: Proceedings of the 18th Conference on USENIX Security Symposium, SSYM 2009, pp. 67–82. USENIX Association, Berkeley (2009)Google Scholar
  19. 19.
    Podelski, A., Schäf, M., Wies, T.: Classifying bugs with interpolants. In: Aichernig, B.K.K., Furia, C.A.A. (eds.) TAP 2016. LNCS, vol. 9762, pp. 151–168. Springer, Cham (2016). doi: 10.1007/978-3-319-41135-4_9 Google Scholar
  20. 20.
    Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th International Conference on Software Engineering, ICSE 2007, pp. 499–510. IEEE Computer Society, Washington, DC (2007)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Van-Thuan Pham
    • 1
    Email author
  • Sakaar Khurana
    • 2
  • Subhajit Roy
    • 2
  • Abhik Roychoudhury
    • 1
  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.Indian Institute of Technology KanpurKanpurIndia

Personalised recommendations