Advertisement

Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

  • Oren Ish-ShalomEmail author
  • Shachar ItzhakyEmail author
  • Roman ManevichEmail author
  • Noam RinetzkyEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11990)

Abstract

We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP \(p:\bar{\Sigma }\rightarrow \bar{\Sigma }\), we compute a partial pseudo-inverse function \(p^{-1}\) such that given a target string \(t\in \overline{\Sigma }\), if \(p^{-1}(t)\ne \bot \) then \(p(p^{-1}(t))=t\). The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis.

Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet \(\Sigma \) from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet.

We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.

Notes

Acknowledgments

This research was sponsored by the Len Blavatnik and the Blavatnik Family foundation, Blavatnik Interdisciplinary Cyber Research Center at Tel Aviv University, the Pazy Foundation, and the Israel Science Foundation (ISF) grant No. 1996/18.

References

  1. 1.
  2. 2.
    Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 209–224. USENIX Association, Berkeley (2008)Google Scholar
  3. 3.
    Checn, W., Duding, J.T.: Program inversion: More than fun! Sci. Comput. Program. 15(1), 1–13 (1990)Google Scholar
  4. 4.
    de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78800-3_24CrossRefGoogle Scholar
  5. 5.
    Dijkstra, E.W.: Program inversion. In: Bauer, F.L., et al. (eds.) Program Construction. LNCS, vol. 69, pp. 54–57. Springer, Heidelberg (1979).  https://doi.org/10.1007/BFb0014657. http://dl.acm.org/citation.cfm?id=647639.733360CrossRefGoogle Scholar
  6. 6.
    Ganesh, V.: Decision procedures for bit-vectors, arrays and integers. Ph.D. thesis, Stanford, CA, USA (2007). aAI3281841Google Scholar
  7. 7.
    Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 January 2016, pp. 499–512 (2016)Google Scholar
  8. 8.
    Glück, R., Kawabe, M.: A method for automatic program inversion based on LR(0) parsing. Fundam. Inform. 66, 367–395 (2005)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Grigore, R., Yang, H.: Abstraction refinement guided by a learnt probabilistic model. SIGPLAN Not. 51(1), 485–498 (2016).  https://doi.org/10.1145/2914770.2837663CrossRefzbMATHGoogle Scholar
  10. 10.
    Gulwani, S.: Automating string processing in spreadsheets using input-output examples. SIGPLAN Not. 46(1), 317–330 (2011)CrossRefGoogle Scholar
  11. 11.
    Gulwani, S.: Programming by examples: applications, algorithms, and ambiguity resolution. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 9–14. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-40229-1_2CrossRefGoogle Scholar
  12. 12.
    Heo, K., Oh, H., Yang, H.: Learning a variable-clustering strategy for octagon from labeled data generated by a static analysis. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 237–256. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-53413-7_12CrossRefzbMATHGoogle Scholar
  13. 13.
    Hu, Q., D’Antoni, L.: Automatic program inversion using symbolic transducers. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 376–389. ACM, New York (2017).  https://doi.org/10.1145/3062341.3062345
  14. 14.
    Jose Oncina, P.G., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)CrossRefGoogle Scholar
  15. 15.
    Kanade, A., Alur, R., Rajamani, S., Ramanlingam, G.: Representation dependence testing using program inversion. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 277–286. ACM, New York (2010).  https://doi.org/10.1145/1882291.1882332
  16. 16.
    Kawabe, M., Glück, R.: The program inverter LRinv and its structure. In: Hermenegildo, M.V., Cabeza, D. (eds.) PADL 2005. LNCS, vol. 3350, pp. 219–234. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-30557-6_17CrossRefGoogle Scholar
  17. 17.
    Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a solver for string constraints. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, pp. 105–116. ACM, New York (2009).  https://doi.org/10.1145/1572272.1572286
  18. 18.
    Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 75. IEEE Computer Society, Washington, DC (2004)Google Scholar
  19. 19.
    Miltner, A., Fisher, K., Pierce, B.C., Walker, D., Zdancewic, S.: Synthesizing bijective lenses. Proc. ACM Program. Lang. 2(POPL), 1:1–1:30 (2017).  https://doi.org/10.1145/3158089CrossRefGoogle Scholar
  20. 20.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  21. 21.
    Nori, A.V., Rajamani, S.K.: Program analysis and machine learning: a win-win deal. In: Yahav, E. (ed.) SAS 2011. LNCS, vol. 6887, pp. 2–3. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23702-7_2CrossRefGoogle Scholar
  22. 22.
    Nori, A.V., Sharma, R.: Termination proofs from tests. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 246–256. ACM, New York (2013).  https://doi.org/10.1145/2491411.2491413
  23. 23.
    Octeau, D., et al.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 469–484. ACM, New York (2016).  https://doi.org/10.1145/2837614.2837661
  24. 24.
    Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via Bayesian optimisation. In: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pp. 572–588. ACM, New York (2015).  https://doi.org/10.1145/2814270.2814309
  25. 25.
    Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 761–774. ACM, New York (2016).  https://doi.org/10.1145/2837614.2837671
  26. 26.
    Ross, B.J.: Running programs backwards: the logical inversion of imperative computation. Formal Aspects Comput. 9(3), 331–348 (1997)CrossRefGoogle Scholar
  27. 27.
    Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 295–306. ACM, New York (2008).  https://doi.org/10.1145/1390630.1390666
  28. 28.
    Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 131–140. ACM, New York (2008).  https://doi.org/10.1145/1368088.1368107
  29. 29.
    Schoenmakers, B.: Inorder traversal of a binary heap and its inversion in optimal time and space. In: Bird, R.S., Morgan, C.C., Woodcock, J.C.P. (eds.) MPC 1992. LNCS, vol. 669, pp. 291–301. Springer, Heidelberg (1993).  https://doi.org/10.1007/3-540-56625-2_19CrossRefGoogle Scholar
  30. 30.
    Singh, R., Gulwani, S.: Learning semantic string transformations from examples. Proc. VLDB Endow. 5(8), 740–751 (2012).  https://doi.org/10.14778/2212351.2212356CrossRefGoogle Scholar
  31. 31.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 3104–3112 (2014)Google Scholar
  32. 32.
    Yi, K., Choi, H., Kim, J., Kim, Y.: An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett. 102(2–3), 118–123 (2007).  https://doi.org/10.1016/j.ipl.2006.11.004MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Zaremba, W., Sutskever, I.: Learning to execute. CoRR abs/1410.4615 (2014)Google Scholar
  34. 34.
    Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21690-4_14CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Tel Aviv UniversityTel AvivIsrael
  2. 2.TechnionHaifaIsrael
  3. 3.Ben-Gurion University of the NegevBeershebaIsrael

Personalised recommendations