Abstract
Inferring the input grammar accepted by a program is central for a variety of software engineering problems, including parsers verification, grammar-based fuzzing, communication protocol inference, and documentation. Sound and complete active learning techniques have been developed for several classes of languages and the corresponding automaton representation, however there are outstanding challenges that are limiting their effective application to the inference of input grammars. We focus on active learning techniques based on \(L^*\) and propose two extensions of the Minimally Adequate Teacher framework that allow the efficient learning of the input language of a program in the form of symbolic automata, leveraging the additional information that can extracted from concolic execution. Upon these extensions we develop two learning algorithms that reduce significantly the number of queries required to converge to the correct hypothesis.
This work has been partially supported by the EPSRC HiPEDS Centre for Doctoral Training (EP/L016796/1), the DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), and a Royal Society Newton Mobility Grant (NMG\(\backslash \) R2 \(\backslash \)170142).
Chapter PDF
Similar content being viewed by others
References
Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75(2), 87–106 (1987)
Angluin, D.: Queries and Concept Learning. Machine Learning 2(4), 319–342 (apr 1988)
Argyros, G., D'Antoni, L.: The learnability of symbolic automata. In: Chockler, H.,Weissenbacher, G. (eds.) Computer Aided Verification. CAV 2018. pp. 427–445. Springer International Publishing, Cham (2018)
Argyros, G., Stais, I., Kiayias, A., Keromytis, A.D.: Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters. Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016 pp. 91–109 (2016). https://doi.org/10.1109/SP.2016.14
Aydin, A., Bang, L., Bultan, T.: Automata-Based Model Counting for String Constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) Computer Aided Verification. pp. 255–272. Lecture Notes in Computer Science, Springer International Publishing, Cham (2015)
Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing Program Input Grammars. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 95–110. ACM (2017), http://arxiv.org/abs/1608.01723
Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: Automatic reverse engineering of input formats. Proceedings of the ACM Conference on Computer and Communications Security pp. 391–402 (2008). https://doi.org/10.1145/1455770.1455820
D’Antoni, L.: AutomatArk (2018), https://github.com/lorisdanto/automatark
D’Antoni, L.: SVPAlib (2018), https://github.com/lorisdanto/symbolicautomata/
D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10426 LNCS, 47–67 (2017)
Drews, S., D’Antoni, L.: Learning symbolic automata. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10205 LNCS, 173–189 (2017)
Geldenhuys, J., Visser, W.: Coastal (2019), https://github.com/DeepseaPlatform/coastal
Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 206–215 (2008). https://doi.org/10.1145/1379022.1375607
Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. p. 213–223. PLDI ’05, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1065010.1065036
Godefroid, P., Peleg, H., Singh, R.: Learn&Fuzz: Machine Learning for Input Fuzzing. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. pp. 50–59. IEEE Press, Urbana-Champaign, IL, USA (2017)
Gopinath, R., Mathis, B., Höschele, M., Kampmann, A., Zeller, A.: Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing (2018). http://arxiv.org/abs/1810.08289
Heinz, J., Sempere, J.M.: Topics in grammatical inference (2016)
de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA (2010)
Höschele, M., Kampmann, A., Zeller, A.: Active Learning of Input Grammars (2017), http://arxiv.org/abs/1708.08731
Isberner, M.: Foundations of Active Automata Learning: an Algorithmic Perspective. Ph.D. thesis (2015)
Isberner, M., Howar, F., Steffen, B.: The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning. In: Bonakdarpour, B., Smolka, S.A. (eds.) Runtime Verification. pp. 307–322. Springer International Publishing, Cham (2014)
Isberner, M., Steffen, B.: An Abstract Framework for Counterexample Analysis in Active Automata Learning. JMLR: Workshop and Conference Proceedings, 79–93 (2014)
Kearns, M.J., Vazirani, U.: Learning Finite Automata by Experimentation. In: An Introduction to Computational Learning Theory, pp. 155–158. The MIT Press (1994)
Lin, Z., Zhang, X., Xu, D.: Reverse engineering input syntactic structure from program execution and its applications. IEEE Transactions on Software Engineering 36(5), 688–703 (2010). https://doi.org/10.1109/TSE.2009.54
Maler, O., Mens, I.E.: Learning Regular Languages over Large Alphabets. In: Abraham, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2014. pp. 485–499. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
Sen, K., Marinov, D., Agha, G.: Cute: A concolic unit testing engine for c. SIGSOFT Softw. Eng. Notes 30(5), 263–272 (Sep 2005). https://doi.org/10.1145/1095430.1081750, https://doi.org/10.1145/1095430.1081750
Veanes, M., De Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. ICST 2010 - 3rd International Conference on Software Testing, Verification and Validation pp. 498–507 (2010). https://doi.org/10.1109/ICST.2010.15
Wu, Z., Johnson, E., Bastani, O., Song, D.: REINAM: Reinforcement Learning for Input-Grammar Inference. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 488–498. ACM (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Clun, D., van Heerden, P., Filieri, A., Visser, W. (2020). Improving Symbolic Automata Learning with Concolic Execution. In: Wehrheim, H., Cabot, J. (eds) Fundamental Approaches to Software Engineering. FASE 2020. Lecture Notes in Computer Science(), vol 12076. Springer, Cham. https://doi.org/10.1007/978-3-030-45234-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-45234-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45233-9
Online ISBN: 978-3-030-45234-6
eBook Packages: Computer ScienceComputer Science (R0)