Identifying Drawbacks in Malicious PDF Detectors

  • Ahmed FalahEmail author
  • Lei Pan
  • Mohamed Abdelrazek
  • Robin Doss
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 878)


Despite the continuous countermeasuring efforts, embedding malware in PDF documents and using it as a malware distribution mechanism is still a threat. This is due to its popularity as a document exchange format, the lack of user awareness of its dangers, as well as its ability to carry and execute malware. Several malicious PDF detection tools have been proposed by the academic community to address the PDF threat. All of which suffer some drawbacks that limit its utility. In this paper, we present the drawbacks of the current state of the art malicious PDF detectors. This was achieved by undertaking a survey of all recent malicious PDF detectors, followed by a comparative evaluation of the available tools. Our results show that Concept drifts is major drawback to the detectors, despite the fact that many detectors use machine learning approaches.


Malicious PDF detection Comparative evaluation Concept drift 



We would like to thank Mustafa Al-Saegh for helping with dataset cleaning and preparation. We would also like to thank VirusTotal, the owner of the Contagio dataset and TPN for providing access to their files. Finally, we are grateful to the authors and creators of PDFrate and Slayer, for providing access to their tools.


  1. 1.
    Adobe: Adobe reader security patches (2017).
  2. 2.
    Adobe: PDF technology center (2017).
  3. 3.
    Carmony, C., Hu, X., Yin, H., Bhaskar, A.V., Zhang, M.: Extract me if you can: abusing PDF parsers in malware detectors, In: NDSS (2016)Google Scholar
  4. 4.
    Contagio: Contagio malware dump (2017).
  5. 5.
    CVE: PDF-related vulnerabilities (2017).
  6. 6.
    Esparza, J.M.: PDF attack - a journey from the exploit kit to the shellcode (2014).
  7. 7.
    Laskov, P., Šrndić, N.: Static detection of malicious JavaScript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382. ACM (2011)Google Scholar
  8. 8.
    Li, W.-J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: M. Hämmerli, B., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 231–250. Springer, Heidelberg (2007). Scholar
  9. 9.
    Liu, D., Wang, H., Stavrou, A.: Detecting malicious JavaScript in PDF through document instrumentation. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 100–111. IEEE (2014)Google Scholar
  10. 10.
    Maiorca, D., Ariu, D., Corona, I., Giacinto, G.: A structural and content-based approach for a precise and robust detection of malicious PDF files. In: 2015 International Conference on Information Systems Security and Privacy (ICISSP), pp. 27–36. IEEE (2015)Google Scholar
  11. 11.
    Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious PDF files detection. In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 510–524. Springer, Heidelberg (2012). Scholar
  12. 12.
    McAfee: Mcafee september 2017 threat report (2017).
  13. 13.
    Trent Nelson: PDF collection (2017).
  14. 14.
    Neupane, A., Saxena, N., Maximo, J.O., Kana, R.: Neural markers of cybersecurity: an fMRI study of phishing and malware warnings. IEEE Trans. Inf. Forensics Secur. 11(9), 1970–1983 (2016). Scholar
  15. 15.
    NIST: National vulnerable database (2017).
  16. 16.
    Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings Of The 28th Annual Computer Security Applications Conference, pp. 239–248. ACM (2012)Google Scholar
  17. 17.
    Smutz, C., Stavrou, A.: When a tree falls: using diversity in ensemble classifiers to identify evasion in malware detectors. In: NDSS (2016)Google Scholar
  18. 18.
    Šrndić, N., Laskov, P.: Detection of malicious PDF files based on hierarchical document structure. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium (2013)Google Scholar
  19. 19.
    Šrndić, N., Laskov, P.: Hidost: a static machine-learning-based detector of malicious files. EURASIP J. Inf. Secur. 2016(1), 22 (2016)CrossRefGoogle Scholar
  20. 20.
    Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, pp. 23–31. ACM (2009)Google Scholar
  21. 21.
    VirusTotal: Virustotal (2017).
  22. 22.
    Xu, M., Kim, T.: PlatPal: detecting malicious documents with platform diversity. In: USENIX Security Symposium (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information TechnologyDeakin UniversityBurwoodAustralia

Personalised recommendations