Skip to main content

Malicious PDF Files Detection Using Structural and Javascript Based Features

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 750))

Abstract

Malicious PDF files recently considered one of the most dangerous threats to the system security. The flexible code-bearing vector of the PDF format enables to attacker to carry out malicious code on the computer system for user exploitation. Many solutions have been developed by security agents for the safety of user’s system, but still inadequate. In this paper, we propose a method for malicious PDF file detection via machine learning approach. The proposed method extract features from PDF file structure and embedded JavaScript code that leverage on advanced parsing mechanism. Instead of looking for the specific attack inside the content of PDF i.e. quite complex procedure, we extract features that are often used for attacks. Moreover, we present the experimental evidence for the choice of learning algorithm to provide the remarkably high accuracy as compared to other existing methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adobe: PDF reference, adobe portable document format version 1.7 (2006)

    Google Scholar 

  2. Symantec: malware security report: protecting your business, customers, and the bottom line. Symantec (2010)

    Google Scholar 

  3. Filiol, E., Blonce, A., Frayssignes, L.: Portable document format (PDF) security analysis and malware threats. J. Comput. Virol. 3, 75–86 (2007)

    Article  Google Scholar 

  4. Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious pdf files detection. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 510–524 (2012)

    Google Scholar 

  5. Esparza, J.M.: Obfuscation and (non-)detection of malicious pdf files. In: S21Sec e-crime (2011)

    Google Scholar 

  6. Laskov, P., Srndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382, December 2011

    Google Scholar 

  7. Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4 (2011)

    Google Scholar 

  8. Tiwari, A., Prakash, A.: Improving classification of J48 algorithm using bagging, boosting and blending ensemble methods on SONAR dataset using WEKA. Int. J. Eng. Tech. Res. 2, 207–209 (2014)

    Google Scholar 

  9. Mila: Contagio Malware Dump. http://contagiodump.blogspot.in/2010/08/Malicious-documents-archive-for.html. Accessed 10 Oct 2014

  10. Maiorca, D., Corona, I., Giacinto, G.: Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 119–130 (2013)

    Google Scholar 

  11. Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious pdf-embedded javascript code through discriminant analysis of API references. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 47–57. ACM, November 2014

    Google Scholar 

  12. Li, W.-J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2007)

    Google Scholar 

  13. Shafiq, M.Zubair, Khayam, S.A., Farooq, M.: Embedded malware detection using Markov n-Grams. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 88–107. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70542-0_5

    Chapter  Google Scholar 

  14. Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In: USENIX Security Symposium, pp. 183–200, August 2011

    Google Scholar 

  15. Schmitt, F., Gassen, J., Gerhards-Padilla, E.: PDF SCRUTINIZER: detecting javascript-based attacks in PDF documents. In: 10th Annual International Conference on Privacy, Security and Trust (PST), pp. 104–111. IEEE, July 2012

    Google Scholar 

  16. Liu, D., Wang, H., Stavrou, A.: Detecting malicious javascript in pdf through document instrumentation. In: 44th IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)

    Google Scholar 

  17. Stevens, D.: PDF Tool. http://blog.didierstevens.com/programs/pdf-tools/

  18. Stevens, D.: Malicious pdf analysis ebook, September 2010. http://didierstevens.com/files/data/malicious-pdf-analysis-ebook.zip. Accessed 22 Sep 2015

  19. Kittilsen, J.: Detecting malicious PDF documents. Master thesis, Gjovik, Norway, pp. 1–112, December 2011

    Google Scholar 

  20. Cova, M., Kruege, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of International Conference on World Wide Web, pp. 281–290, July 2010

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonal Dabral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Dabral, S., Agarwal, A., Mahajan, M., Kumar, S. (2017). Malicious PDF Files Detection Using Structural and Javascript Based Features. In: Kaushik, S., Gupta, D., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2017. Communications in Computer and Information Science, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-10-6544-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6544-6_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6543-9

  • Online ISBN: 978-981-10-6544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics