Appropriate test data are a crucial factor to succeed in fuzz testing. Most of the real-world applications, however, accept complex structure inputs containing data surrounded by meta-data which is processed in several stages comprising of the parsing and rendering (execution). The complex structure of some input files makes it difficult to generate efficient test data automatically. The success of deep learning to cope with complex tasks, specifically generative tasks, has motivated us to exploit it in the context of test data generation for complicated structures such as PDF files. In this respect, a neural language model (NLM) based on deep recurrent neural networks (RNNs) is used to learn the structure of complex inputs. To target both the parsing and rendering steps of the software under test (SUT), our approach generates new test data while distinguishing between data and meta-data that significantly improve the input fuzzing. To assess the proposed approach, we have developed a modular file format fuzzer, IUST-DeepFuzz. Our experimental results demonstrate the relatively high coverage of MuPDF code by our proposed fuzzer, IUST-DeepFuzz, in comparison with the state-of-the-art tools such as learn&fuzz, AFL, Augmented-AFL, and random fuzzing. In summary, our experiments with many deep learning models revealed the fact that the simpler the deep learning models applied to generate test data, the higher the code coverage of the software under test will be.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
The complete source code and documentation of IUST-DeepFuzz are available on our GitHub repository https://github.com/m-zakeri/iust_deep_fuzz.
The IUST-PDFCorpus is available on Zenodo: https://zenodo.org/record/3484013.
Independent and identically distributed.
This release is available for download at https://mupdf.com/release_history.html.
Miller BP, Fredriksen L, So B (1990) An empirical study of the reliability of Unix utilities. Commun ACM 33(12):32–44. https://doi.org/10.1145/96267.96279
Miller BP, Koski D, Pheow C, Maganty LV, Murthy R, Natarajan A, Steidl J (1995) Fuzz revisited: a re-examination of the reliability of Unix utilities and services. Tech. rep, University of Wisconsin-Madison
Forrester JE, Miller BP (2000) An empirical study of the robustness of Windows NT applications using random testing. In: Proceedings of the 4th Conference on USENIX Windows Systems Symposium—Volume 4, WSS’00, USENIX Association, Berkeley, CA, p 6
Miller BP, Cooksey G, Moore F (2006) An empirical study of the robustness of MacOS applications using random testing. In: Proceedings of the 1st International Workshop on Random Testing, RT ’06, ACM, New York, pp 46–54. https://doi.org/10.1145/1145735.1145743
Sutton M, Greene A, Amini P (2007) Fuzzing: brute force vulnerability discovery. Addison-Wesley Professional, Boston http://fuzzing.org/
Rathaus N, Evron G (2007) Open source fuzzing tools. Syngress Publishing
Pham V-T, Böhme M, Roychoudhury A (2016) Model-based whitebox fuzzing for program binaries. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, ACM, New York, pp 543–553. https://doi.org/10.1145/2970276.2970316
Godefroid P, Peleg H, Singh R (2017) Learn&fuzz: machine learning for input fuzzing. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, ASE 2017, IEEE Press, Piscataway, pp 50–59
Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H (2017) Vuzzer: application-aware evolutionary fuzzing. In: Proceedings of the network and distributed system security symposium (NDSS)
Adobe Systems Inc (2006) PDF reference, version 1.7. Available: https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
Artifex Software Inc., MuPDF, [Online]. Available: https://mupdf.com/. Accessed: 25 July 2018
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. In: IEEE Symposium on Security and Privacy (SP), pp 579–594. https://doi.org/10.1109/SP.2017.23
Chen C, Cui B, Ma J, Wu R, Guo J, Liu W (2018) A systematic review of fuzzing techniques. Comput Secur 75:118–137. https://doi.org/10.1016/j.cose.2018.02.002
Li J, Zhao B, Zhang C (2018) Fuzzing: a survey. Cybersecurity 1(1):6. https://doi.org/10.1186/s42400-018-0002-y
Mcnally R, Yiu K, Grove D (2012) Fuzzing: the state of the art. DSTO Defence Science and Technology Organisation, 55
Zalewsky M, American fuzzy lop, [Online]. Available: http://lcamtuf.coredump.cx/afl/. Accessed 11 Oct 2017
Rajpal M, Blum W, Singh R, Not all bytes are equal: neural byte sieve for fuzzing, CoRR abs/1711.04596. arXiv:1711.04596
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates Inc, pp 3104–3112
Cho K, van Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y, Learning phrase representations using RNN encoder-decoder for statistical machine translation, CoRR abs/1406.1078. arXiv:1406.1078
Mikolov T, Karafit M, Burget L, Cernock J, Khudanpur S (2010) Recurrent neural network based language model 2:1045–1048
DeMott J, Enbody R, Punch W, Revolutionizing the field of grey-box attack surface testing with evolutionary fuzzing. Defcon 15
Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2018, ACM, New York, pp 95–105. https://doi.org/10.1145/3213846.3213848
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.19220.127.116.115
Lv C, Ji S, Li Y, Zhou J, Chen J, Zhou P, Chen J, Smartseed: smart seed generation for efficient fuzzing, CoRR abs/1807.02606. arXiv:1807.02606
Böttinger K, Godefroid P, Singh R, Deep reinforcement fuzzing, CoRR abs/1801.04589. arXiv:1801.04589
Wang Y, Wu Z, Wei Q, Wang Q (2019) Neufuzz: efficient fuzzing with deep neural network. IEEE Access 7:36340–36352
Jurafsky D, Martin JH (2017) Speech and language processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/
Luong MT (2016) Neural machine translation, Ph.D Thesis, Stanford University
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Karpathy A, The unreasonable effectiveness of recurrent neural networks, [Online]. Available: http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed 21 10 2017
Chollet F, etal (2015) Keras, https://keras.io
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow IJ, Harp A, Irving G, Isard M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray DG, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker PA, Vanhoucke V, Vasudevan V, Viégas FB, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X, Tensorflow: large-scale machine learning on heterogeneous distributed systems, CoRR abs/1603.04467. arXiv:1603.04467
Kingma DP, Ba J, Adam: a method for stochastic optimization, CoRR abs/1412.6980. arXiv:1412.6980
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Takanen A, Demott JD, Miller C (2018) Fuzzing for software security testing and quality assurance, 2nd edn. Artech House Inc, Norwood
Mikolov T (2012) Statistical language models based on neural networks, Ph.D. Thesis, Brno University of Technology
Microsoft, VSPerfMon, [Online]. Available: https://docs.microsoft.com/en-us/visualstudio/profiling/vsperfmon?view=vs-2017 Accessed 18 July 2018
Microsoft, Application verifier (appverif.exe), [Online]. Available: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/application-verifier. Accessed 18 July 2018
Mozilla Labs, PDF.js, [Online]. Available: https://github.com/mozilla/pdf.js/tree/master/test/pdfs. Accessed 15 Oct 2017
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates Inc, pp 2672–2680
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zakeri Nasrabadi, M., Parsa, S. & Kalaee, A. Format-aware learn&fuzz: deep test data generation for efficient fuzzing. Neural Comput & Applic (2020). https://doi.org/10.1007/s00521-020-05039-7
- Test data generation
- File format fuzzing
- Code coverage
- Neural language model
- Recurrent neural network
- Deep learning