Skip to main content
Log in

Z3str2: an efficient solver for strings, regular expressions, and length constraints

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

In recent years, string solvers have become an essential component in many formal verification, security analysis, and bug-finding tools. Such solvers typically support a theory of string equations, the length function, and the regular-expression membership predicate. These enable considerable expressive power, which comes at the cost of slow solving time, and in some cases even non-termination. We present three techniques, designed for word-based SMT string solvers, to mitigate these problems: (1) detecting overlapping variables, which is essential to avoiding common cases of non-termination; (2) pruning of the search space via bi-directional integration between the string and integer theories, enabling new cross-domain heuristics; and (3) a binary search based heuristic, allowing the procedure to skip unnecessary string length queries and converge on consistent length assignments faster for large strings. We have implemented above techniques atop the Z3-str solver, resulting in a significantly more robust and efficient solver, dubbed Z3str2, for the quantifier-free theory of string equations, the regular-expression membership predicate, and linear arithmetic over the length function. We report on a series of experiments over four sets of challenging real-world benchmarks, where we compare Z3str2 with five different string solvers: S3, CVC4, Kaluza, PISA and Stranger. Each of these tools utilizes a different solving strategy and/or string representation (based e.g. on words, bit vectors or automata). The results point to the efficacy of our proposed techniques, which yield dramatic performance improvement. We also demonstrate performance improvements enabled by Z3str2 in the context of symbolic execution for string-manipulating programs. We observe that the techniques presented here are of broad applicability, and can be integrated into other string solvers to improve their performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. The Z3str2 code, as well as the data pertaining to our experiments, are all available at [1].

  2. This restriction is made only for consistency with the SMT-LIB standard [7] in a typical implementation. The only integer literals allowed in the SMT-LIB syntax are 0 and positive integers. Note that both the SMT-LIB standard and the syntax we give here still allow negative integer constants to be expressed as, for example, (- 0 3) for \(-3\).

  3. While we do support AND–OR combination of word equations, without loss of generality it is sufficient to describe the procedure only for a conjunction of word equations.

  4. https://sites.google.com/site/z3strsolver/.

References

  1. Z3str2 String Constraint Solver. https://sites.google.com/site/z3strsolver/

  2. IBM Security AppScan Source. http://www-03.ibm.com/software/products/en/appscan-source

  3. Kausler Suite. https://github.com/BoiseState/string-constraint-solvers

  4. LibStranger. https://github.com/vlab-cs-ucsb/LibStranger

  5. Personal communications with the Stranger team. 2015

  6. Abdulla PA, Atig MF, Chen Y-F, Holík L, Rezine A, Rümmer P, Stenman J (2014) String constraints for verification. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 150–166

  7. Barrett C, Fontaine P, Tinelli C (2016) The Satisfiability Modulo Theories Library (SMT-LIB). http://www.SMT-LIB.org

  8. Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 307–321

  9. Cristian C, Daniel D, Dawson E (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX conference on operating systems design and implementation, OSDI’08. USENIX Association, Berkeley, pp 209–224

  10. Chipounov V, Kuznetsov V, Candea G (2011) S2e: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems. ASPLOS XVI. ACM, New York, pp 265–278

  11. Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: Proceedings of the 10th international conference on static analysis, SAS’03, pp 1–18

  12. De Moura L, Bjørner N (2008) Z3: an efficient smt solver. In: Proceedings of the theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems, TACAS’08, pp 337–340

  13. Ganesh V, Dill DL (2007) A decision procedure for bit-vectors and arrays. In: Proceedings of the 19th international conference on computer aided verification, CAV’07, pp 519–531

  14. Ganesh V, Minnes M, Solar-Lezama A, Rinard M (2012) Word equations with length constraints: what’s decidable? In: HVC’12

  15. Ghosh I, Shafiei N, Li G, Chiang W-F (2013) JST: an automatic test generation tool for industrial java applications with strings. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 992–1001

  16. Hooimeijer P, Weimer W (2010) Solving string constraints lazily. In: Proceedings of the IEEE/ACM international conference on automated software engineering, ASE ’10, pp 377–386

  17. Hopcroft JE, Motwani R, Ullman JD (2007) Introduction to automata theory, languages, and computation. Pearson/Addison Wesley, Reading

    MATH  Google Scholar 

  18. Jeon J, Qiu X, Fetter-Degges J, Foster JS, Solar-Lezama A (2016) Synthesizing framework models for symbolic execution. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. ACM, New York, pp 156–167

  19. Jeż A (2013) Recompression: word equations and beyond. In: Developments in language theory, Lecture Notes in Computer Science, pp 12–26

  20. Karhumäki J, Mignosi F, Plandowski W (2000) The expressibility of languages and relations by word equations. J ACM 47(3):483–505

    Article  MathSciNet  MATH  Google Scholar 

  21. Kausler S (2014) Evaluation of string constraint solvers using dynamic symbolic execution. Master’s Thesis, Boise State University

  22. Kausler S, Sherman E (2014) Evaluation of string constraint solvers in the context of symbolic execution. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, ASE ’14. ACM, New York, pp 259–270

  23. Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) Hampi: a solver for string constraints. In: Proceedings of the eighteenth international symposium on software testing and analysis, ISSTA ’09, pp 105–116

  24. Li G, Andreasen E, Ghosh I (2014) SymJS: automatic symbolic testing of javascript web applications. In: Proceedings of the 22Nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 449–459

  25. Li G, Ghosh I (2013) PASS: string solving with parameterized array and interval automaton. In: 9th international Haifa verification conference, HVC ’13, pp 15–31

  26. Liang T, Reynolds A, Tinelli C, Barrett C, Deters M (2014) A dpll(t) theory solver for a theory of strings and regular expressions. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 646–662

  27. Makanin GS (1977) The problem of solvability of equations in a free semigroup. Math Sbornik 103:147–236 (1977). English transl. in Math USSR Sbornik 32

  28. Matiyasevich Y (2007) Word equations, Fibonacci numbers, and Hilbert’s tenth problem. In: Workshop on Fibonacci words

  29. Plandowski W (2004) Satisfiability of word equations with constants is in pspace. J ACM 51(3):483–496

    Article  MathSciNet  MATH  Google Scholar 

  30. Plandowski W (2006) An efficient algorithm for solving word equations. In: Proceedings of the 38th annual ACM symposium on theory of computing, STOC ’06, pp 467–476

  31. Qu X, Robinson B (2011) A case study of concolic testing tools and their limitations. In: 2011 international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society, pp 117–126

  32. Redelinghuys G, Visser W, Geldenhuys J (2012) Symbolic execution of programs with strings. In: Proceedings of the South African Institute for computer scientists and information technologists conference, SAICSIT ’12, pp 139–148

  33. Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for javascript. In: Proceedings of the 2010 IEEE symposium on security and privacy, SP ’10, pp 513–528

  34. Schulz K (1992) Makanin’s algorithm for word equations-two improvements and a generalization. In: Schulz K (ed) Word equations and related topics, Lecture Notes in Computer Science, vol 572. Springer, Berlin, pp 85–150

    Chapter  Google Scholar 

  35. Tateishi T, Pistoia M, Tripp O (2013) Path- and index-sensitive string analysis based on monadic second-order logic. ACM Trans Softw Eng Methodol 22(4):33:1–33:33

    Article  Google Scholar 

  36. Trinh M-T, Chu D-H, Jaffar J (2014) S3: a symbolic string solver for vulnerability detection in web applications. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14, pp 1232–1243

  37. Xie X, Liu Y, Le W, Li X, Chen H (2015) S-looper: automatic summarization for multipath string loops. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015. ACM, New York, pp 188–198

  38. Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for php. In: Proceedings of the 16th international conference on tools and algorithms for the construction and analysis of systems, TACAS’10, pp 154–157

  39. Yu F, Bultan T, Ibarra OH (2009) Symbolic string verification: combining string analysis and size analysis. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 322–336

  40. Zheng Y, Ganesh V, Subramanian S, Tripp O, Dolby J, Zhang X (2015) Effective search-space pruning for solvers of string equations, regular expressions and length constraints. Technical report. https://sites.google.com/site/z3strsolver/publications

  41. Zheng Y, Zhang X, Ganesh V (2013) Z3-str: a z3-based string solver for web application analysis. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013, pp 114–124

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunhui Zheng.

Additional information

O. Tripp: The research leading to this manuscript, and earlier versions of the manuscript, were completed while the author was at IBM Research.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Y., Ganesh, V., Subramanian, S. et al. Z3str2: an efficient solver for strings, regular expressions, and length constraints. Form Methods Syst Des 50, 249–288 (2017). https://doi.org/10.1007/s10703-016-0263-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-016-0263-6

Keywords

Navigation