Skip to main content

Heavy Log Reader: Learning the Context of Cyber Attacks Automatically with Paragraph Vector

  • Conference paper
  • First Online:
Information Systems Security (ICISS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10717))

Included in the following conference series:

Abstract

Cyberattack techniques are evolving every second, and detecting unknown malicious communication is a challenging task. Pattern-matching-based techniques and using malicious website blacklists are easily avoided, and not efficient to detect unknown malicious communication. Therefore, many behavior-based detection methods are proposed, which use the characteristic of drive-by-download attacks or C&C traffic. However, many previous methods specialize the attack techniques and the adaptability is limited. Moreover, they have to decide the feature vectors every attack method. This paper proposes a generic detection method, which is independent of attack methods and does not need devising feature vectors. Our method uses Paragraph Vector an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and learns the context in proxy server logs. We conducted cross-validation and timeline analysis with the D3M and the BOS in the MWS datasets. The experimental results show our method can detect unknown malicious communication precisely in proxy server logs. The best F-measure achieves 0.99 in unknown drive-by-download attacks and 0.98 in unknown C&C traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  2. Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30143-1_11

    Chapter  Google Scholar 

  3. Moore, D., Shannon, C., Brown, D.J., Voelker, G.M., Savage, S.: Inferring internet denial-of-service activity. ACM Trans. Comput. Syst. 24(2), 115–139 (2006)

    Article  Google Scholar 

  4. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74320-0_10

    Chapter  Google Scholar 

  5. Song, H., Turner, J.: Toward advocacy-free evaluation of packet classification algorithms. IEEE Trans. Comput. 60(5), 723–733 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. In: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240 (2005)

    Google Scholar 

  7. Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the SIGCOMM Workshop on Mining Network Data, pp. 281–286 (2006)

    Google Scholar 

  8. Gu, G., Perdisci, R., Zhang, J., Lee, W.: Botminer: clustering analysis of network traffic for protocol and structure independent Botnet detection. In: Proceedings of the USENIX Security Symposium, vol. 5, pp. 139–154 (2008)

    Google Scholar 

  9. Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting Botnet command and control servers through large-scale NetFlow analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 129–138 (2012)

    Google Scholar 

  10. Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a dynamic reputation system for DNS. In: Proceedings of the 19th USENIX Security Symposium (2010)

    Google Scholar 

  11. Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou II, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: Proceedings of the 20th USENIX Security Symposium (2011)

    Google Scholar 

  12. Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Proceedings of the 21th USENIX Security Symposium (2012)

    Google Scholar 

  13. Rahbarinia, B., Perdisci, R., Antonakakis, M.: Segugio: efficient behavior-based tracking of new malware-control domains in large ISP networks. In: Proceedings of the 2015 IEEE/IFIP International Conference on Dependable Systems and Networks (2015)

    Google Scholar 

  14. Kruegel, C., Vigna, G.: Anomaly detection of webbased attacks. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 251–261 (2003)

    Google Scholar 

  15. Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings of the 2nd USENIX Conference on Web Application Development, pp. 1–11 (2011)

    Google Scholar 

  16. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious URLs. ACM Trans. Intell. Syst. Technol. 2(3) (2011). Article 30

    Google Scholar 

  17. Huang, H., Qian, L., Wang, Y.: A SVM-based technique to detect phishing URLs. Inf. Technol. J. 11(7), 921–925 (2012)

    Article  Google Scholar 

  18. Zhao, P., Hoi, S.C.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–927 (2013)

    Google Scholar 

  19. Invernizzi, L., Miskovic, S., Torres, R., Saha, S., Lee, S., Mellia, M., Kruegel, C., Vigna, G.: Nazca: detecting malware distribution in large-scale networks. In: Proceedings of the Network and Distributed System Security Symposium (2014)

    Google Scholar 

  20. Nelms, T., Perdisci, R., Antonakakis, M., Ahamad, M.: Webwitness: investigating, categorizing, and mitigating malware download paths. In: Proceedings of the 24th USENIX Security Symposium, pp. 1025–1040 (2015)

    Google Scholar 

  21. Bartos, K., Sofka. M.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: Proceedings of the 25th USENIX Security Symposium, pp. 806–822 (2016)

    Google Scholar 

  22. Mimura, M., Otsubo, Y., Tanaka, H., Tanaka, H.: A practical experiment of the HTTP-based RAT detection method in proxy server logs. In: Proceedings of the 12th Asia Joint Conference on Information Security (2017)

    Google Scholar 

  23. Shibahara, T., Yamanishi, K., Takata, Y., Chiba, D., Akiyama, M., Yagi, T. Ohsita, Y., Murata, M.: Malicious URL sequence detection using event de-noising convolutional neural network. In: Proceedings of the IEEE ICC 2017 Communication and Information Systems Security Symposium (2017)

    Google Scholar 

  24. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  25. gensim. https://radimrehurek.com/gensim/

  26. scikit-learn. http://scikit-learn.org/

  27. Chainer. https://chainer.org/

  28. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2015)

    Google Scholar 

  29. Hatada, M., Akiyama, M., Matsuki, T.: Empowering anti-malware research in Japan by sharing the MWS datasets. J. Inf. Process. 23(5), 579–588 (2015). https://www.jstage.jst.go.jp/article/ipsjjip/23/5/23_579/_article

    Google Scholar 

  30. Malware-Traffic-Analysis.net. http://www.malware-traffic-analysis.net/

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 17K06455.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mamoru Mimura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mimura, M., Tanaka, H. (2017). Heavy Log Reader: Learning the Context of Cyber Attacks Automatically with Paragraph Vector. In: Shyamasundar, R., Singh, V., Vaidya, J. (eds) Information Systems Security. ICISS 2017. Lecture Notes in Computer Science(), vol 10717. Springer, Cham. https://doi.org/10.1007/978-3-319-72598-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72598-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72597-0

  • Online ISBN: 978-3-319-72598-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics