Abstract
Malicious PDF files recently considered one of the most dangerous threats to the system security. The flexible code-bearing vector of the PDF format enables to attacker to carry out malicious code on the computer system for user exploitation. Many solutions have been developed by security agents for the safety of user’s system, but still inadequate. In this paper, we propose a method for malicious PDF file detection via machine learning approach. The proposed method extract features from PDF file structure and embedded JavaScript code that leverage on advanced parsing mechanism. Instead of looking for the specific attack inside the content of PDF i.e. quite complex procedure, we extract features that are often used for attacks. Moreover, we present the experimental evidence for the choice of learning algorithm to provide the remarkably high accuracy as compared to other existing methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adobe: PDF reference, adobe portable document format version 1.7 (2006)
Symantec: malware security report: protecting your business, customers, and the bottom line. Symantec (2010)
Filiol, E., Blonce, A., Frayssignes, L.: Portable document format (PDF) security analysis and malware threats. J. Comput. Virol. 3, 75–86 (2007)
Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious pdf files detection. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 510–524 (2012)
Esparza, J.M.: Obfuscation and (non-)detection of malicious pdf files. In: S21Sec e-crime (2011)
Laskov, P., Srndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382, December 2011
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4 (2011)
Tiwari, A., Prakash, A.: Improving classification of J48 algorithm using bagging, boosting and blending ensemble methods on SONAR dataset using WEKA. Int. J. Eng. Tech. Res. 2, 207–209 (2014)
Mila: Contagio Malware Dump. http://contagiodump.blogspot.in/2010/08/Malicious-documents-archive-for.html. Accessed 10 Oct 2014
Maiorca, D., Corona, I., Giacinto, G.: Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 119–130 (2013)
Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious pdf-embedded javascript code through discriminant analysis of API references. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 47–57. ACM, November 2014
Li, W.-J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2007)
Shafiq, M.Zubair, Khayam, S.A., Farooq, M.: Embedded malware detection using Markov n-Grams. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 88–107. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70542-0_5
Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In: USENIX Security Symposium, pp. 183–200, August 2011
Schmitt, F., Gassen, J., Gerhards-Padilla, E.: PDF SCRUTINIZER: detecting javascript-based attacks in PDF documents. In: 10th Annual International Conference on Privacy, Security and Trust (PST), pp. 104–111. IEEE, July 2012
Liu, D., Wang, H., Stavrou, A.: Detecting malicious javascript in pdf through document instrumentation. In: 44th IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)
Stevens, D.: PDF Tool. http://blog.didierstevens.com/programs/pdf-tools/
Stevens, D.: Malicious pdf analysis ebook, September 2010. http://didierstevens.com/files/data/malicious-pdf-analysis-ebook.zip. Accessed 22 Sep 2015
Kittilsen, J.: Detecting malicious PDF documents. Master thesis, Gjovik, Norway, pp. 1–112, December 2011
Cova, M., Kruege, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of International Conference on World Wide Web, pp. 281–290, July 2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dabral, S., Agarwal, A., Mahajan, M., Kumar, S. (2017). Malicious PDF Files Detection Using Structural and Javascript Based Features. In: Kaushik, S., Gupta, D., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2017. Communications in Computer and Information Science, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-10-6544-6_14
Download citation
DOI: https://doi.org/10.1007/978-981-10-6544-6_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6543-9
Online ISBN: 978-981-10-6544-6
eBook Packages: Computer ScienceComputer Science (R0)