Skip to main content

EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers

  • Conference paper
  • First Online:
Book cover Security and Privacy (ISEA-ISAP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 939))

Included in the following conference series:

Abstract

There have been significant developments in the application of Machine Learning based classifiers for identifying malware camouflaging as benign files (our study is based on PDF files) in recent times like PDFRate. However, unlike other fields where statistical techniques are used, malware detection lacks the fundamental assumption in ML-based techniques that the training data represents the perspective input. Instead, malware can be designed to specifically break the ML classifiers as an anomaly. We present a thorough study and the results of our improvement over the implementation of one such prominent project EvadeML, which is a Genetic Programming based technique to evade ML-based malware classifiers. EvadeML has shown 100% success rate for two target PDF malware classifiers PDFRate and Hidost. We have modified the EvadeML to have a better evasion efficiency for another PDF malware classifier AnalyzePDF and found significant improvement over the EvadeML. We have also tested our modified approach for the PDFRate malware classifier and found 100% success rate as in the original EvadeML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 2018 internet security threat report. https://www.symantec.com/security-center/threat-report

  2. AnalyzePDF - bringing the dirt up to the surface. https://hiddenillusion.github.io/2013/12/03/analyzepdf-bringing-dirt-up-to-surface/

  3. CVE details. Adobe acrobat reader—CVE security vulnerabilities, versions and detailed reports. https://www.cvedetails.com/product/497

  4. Jaff ransomware hiding in a PDF document. https://www.vmray.com/cyber-security-blog/jaff-ransomware-hiding-in-a-pdf-document/

  5. Yara rules. https://github.com/Yara-Rules/rules

  6. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction, vol. 1. Morgan Kaufmann, San Francisco (1998)

    Book  MATH  Google Scholar 

  7. Chenette, S.: Malicious documents archive for signature testing and research - contagio malware dump. http://contagiodump.blogspot.com/2010/08/malicious-documents-archive-for.html

  8. Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 3422–3426 (2013). https://doi.org/10.1109/ICASSP.2013.6638293

  9. Dang, H., Huang, Y., Chang, E.C.: Evading classifiers by morphing in the dark. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 119–133. ACM (2017)

    Google Scholar 

  10. Gonzalez, L.E., Vázquez, R.A.: Malware classification using Euclidean distance and artificial neural networks. In: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, México, Mexico, 24–30 November 2013, pp. 103–108 (2013). Special Session Proceedings. https://doi.org/10.1109/MICAI.2013.18

  11. Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.D.: On the (statistical) detection of adversarial examples. CoRR abs/1702.06280 (2017). http://arxiv.org/abs/1702.06280

  12. Russu, P., Demontis, A., Biggio, B., Fumera, G., Roli, F.: Secure kernel machines against evasion attacks. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2016, Vienna, Austria, 28 October 2016, pp. 59–69 (2016). https://doi.org/10.1145/2996758.2996771

  13. Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: 28th Annual Computer Security Applications Conference, ACSAC 2012, Orlando, FL, USA, 3–7 December 2012, pp. 239–248 (2012). https://doi.org/10.1145/2420950.2420987

  14. Srndic, N., Laskov, P.: Detection of malicious PDF files based on hierarchical document structure. In: 20th Annual Network and Distributed System Security Symposium, NDSS 2013, San Diego, California, USA, 24–27 February 2013 (2013). https://www.ndss-symposium.org/ndss2013/detection-malicious-pdf-files-based-hierarchical-document-structure

  15. Tong, L., Li, B., Hajaj, C., Vorobeychik, Y.: Feature conservation in adversarial classifier evasion: a case study. CoRR abs/1708.08327 (2017). http://arxiv.org/abs/1708.08327

  16. Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers: a case study on PDF malware classifiers. In: 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, 21–24 February 2016 (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/automatically-evading-classifiers.pdf

Download references

Acknowledgement

The research work has been conducted in the Information Security Education and Awareness (ISEA) Lab of Indian Institute of Technology, Guwahati, Assam, India. The authors would like to acknowledge IIT Guwahati, ISEA, and Ministry of Electronics and Information Technology (MeitY), Government of India for the support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sukanta Dey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dey, S., Kumar, A., Sawarkar, M., Singh, P.K., Nandi, S. (2019). EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers. In: Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M., Faruki, P. (eds) Security and Privacy. ISEA-ISAP 2019. Communications in Computer and Information Science, vol 939. Springer, Singapore. https://doi.org/10.1007/978-981-13-7561-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7561-3_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7560-6

  • Online ISBN: 978-981-13-7561-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics