EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers

Dey, Sukanta; Kumar, Abhishek; Sawarkar, Mehul; Singh, Pranav Kumar; Nandi, Sukumar

doi:10.1007/978-981-13-7561-3_11

Sukanta Dey¹³,
Abhishek Kumar¹³,
Mehul Sawarkar¹³,
Pranav Kumar Singh¹³ &
…
Sukumar Nandi¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 939))

Included in the following conference series:

International Conference on Security & Privacy

718 Accesses
8 Citations

Abstract

There have been significant developments in the application of Machine Learning based classifiers for identifying malware camouflaging as benign files (our study is based on PDF files) in recent times like PDFRate. However, unlike other fields where statistical techniques are used, malware detection lacks the fundamental assumption in ML-based techniques that the training data represents the perspective input. Instead, malware can be designed to specifically break the ML classifiers as an anomaly. We present a thorough study and the results of our improvement over the implementation of one such prominent project EvadeML, which is a Genetic Programming based technique to evade ML-based malware classifiers. EvadeML has shown 100% success rate for two target PDF malware classifiers PDFRate and Hidost. We have modified the EvadeML to have a better evasion efficiency for another PDF malware classifier AnalyzePDF and found significant improvement over the EvadeML. We have also tested our modified approach for the PDFRate malware classifier and found 100% success rate as in the original EvadeML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

2018 internet security threat report. https://www.symantec.com/security-center/threat-report
AnalyzePDF - bringing the dirt up to the surface. https://hiddenillusion.github.io/2013/12/03/analyzepdf-bringing-dirt-up-to-surface/
CVE details. Adobe acrobat reader—CVE security vulnerabilities, versions and detailed reports. https://www.cvedetails.com/product/497
Jaff ransomware hiding in a PDF document. https://www.vmray.com/cyber-security-blog/jaff-ransomware-hiding-in-a-pdf-document/
Yara rules. https://github.com/Yara-Rules/rules
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction, vol. 1. Morgan Kaufmann, San Francisco (1998)
Book MATH Google Scholar
Chenette, S.: Malicious documents archive for signature testing and research - contagio malware dump. http://contagiodump.blogspot.com/2010/08/malicious-documents-archive-for.html
Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 3422–3426 (2013). https://doi.org/10.1109/ICASSP.2013.6638293
Dang, H., Huang, Y., Chang, E.C.: Evading classifiers by morphing in the dark. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 119–133. ACM (2017)
Google Scholar
Gonzalez, L.E., Vázquez, R.A.: Malware classification using Euclidean distance and artificial neural networks. In: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, México, Mexico, 24–30 November 2013, pp. 103–108 (2013). Special Session Proceedings. https://doi.org/10.1109/MICAI.2013.18
Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.D.: On the (statistical) detection of adversarial examples. CoRR abs/1702.06280 (2017). http://arxiv.org/abs/1702.06280
Russu, P., Demontis, A., Biggio, B., Fumera, G., Roli, F.: Secure kernel machines against evasion attacks. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2016, Vienna, Austria, 28 October 2016, pp. 59–69 (2016). https://doi.org/10.1145/2996758.2996771
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: 28th Annual Computer Security Applications Conference, ACSAC 2012, Orlando, FL, USA, 3–7 December 2012, pp. 239–248 (2012). https://doi.org/10.1145/2420950.2420987
Srndic, N., Laskov, P.: Detection of malicious PDF files based on hierarchical document structure. In: 20th Annual Network and Distributed System Security Symposium, NDSS 2013, San Diego, California, USA, 24–27 February 2013 (2013). https://www.ndss-symposium.org/ndss2013/detection-malicious-pdf-files-based-hierarchical-document-structure
Tong, L., Li, B., Hajaj, C., Vorobeychik, Y.: Feature conservation in adversarial classifier evasion: a case study. CoRR abs/1708.08327 (2017). http://arxiv.org/abs/1708.08327
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers: a case study on PDF malware classifiers. In: 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, 21–24 February 2016 (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/automatically-evading-classifiers.pdf

Download references

Acknowledgement

The research work has been conducted in the Information Security Education and Awareness (ISEA) Lab of Indian Institute of Technology, Guwahati, Assam, India. The authors would like to acknowledge IIT Guwahati, ISEA, and Ministry of Electronics and Information Technology (MeitY), Government of India for the support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India
Sukanta Dey, Abhishek Kumar, Mehul Sawarkar, Pranav Kumar Singh & Sukumar Nandi

Authors

Sukanta Dey
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Mehul Sawarkar
View author publications
You can also search for this author in PubMed Google Scholar
Pranav Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sukumar Nandi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sukanta Dey .

Editor information

Editors and Affiliations

Indian Institute of Technology Guwahati, Guwahati, India
Sukumar Nandi
Indian Institute of Technology Jammu, Jammu, India
Devesh Jinwala
Indian Institute of Technology Bombay, Mumbai, India
Virendra Singh
Malaviya National Institute of Technology, Jaipur, India
Vijay Laxmi
Indian Institute of Technology Jammu, Jammu, Jammu and Kashmir, India
Manoj Singh Gaur
Department of Technical Education, Government of Gujarat, Rajkot, India
Parvez Faruki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dey, S., Kumar, A., Sawarkar, M., Singh, P.K., Nandi, S. (2019). EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers. In: Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M., Faruki, P. (eds) Security and Privacy. ISEA-ISAP 2019. Communications in Computer and Information Science, vol 939. Springer, Singapore. https://doi.org/10.1007/978-981-13-7561-3_11

Download citation

DOI: https://doi.org/10.1007/978-981-13-7561-3_11
Published: 30 April 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7560-6
Online ISBN: 978-981-13-7561-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics