Skip to main content

From NLP (Natural Language Processing) to MLP (Machine Language Processing)

  • Conference paper
Computer Network Security (MMM-ACNS 2010)

Abstract

Natural Language Processing (NLP) in combination with Machine Learning techniques plays an important role in the field of automatic text analysis. Motivated by the successful use of NLP in solving text classification problems in the area of e-Participation and inspired by our prior work in the field of polymorphic shellcode detection we gave classical NLP-processes a trial in the special case of malicious code analysis. Any malicious program is based on some kind of machine language, ranging from manually crafted assembler code that exploits a buffer overflow to high level languages such as Javascript used in web-based attacks. We argue that well known NLP analysis processes can be modified and applied to the malware analysis domain. Similar to the NLP process we call this process Machine Language Processing (MLP). In this paper, we use our e-Participation analysis architecture, extract the various NLP techniques and adopt them for the malware analysis process. As proof-of-concept we apply the adopted framework to malicious code examples from Metasploit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Teufl, P., Payer, U., Parycek, P.: Automated analysis of e-participation data by utilizing associative networks, spreading activation and unsupervised learning. In: Macintosh, A., Tambouris, E. (eds.) Electronic Participation. LNCS, vol. 5694, pp. 139–150. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Payer, U., Teufl, P., Kraxberger, S., Lamberger, M.: Massive data mining for polymorphic code detection. In: Gorodetsky, V., Kotenko, I., Skormin, V.A. (eds.) MMM-ACNS 2005. LNCS, vol. 3685, pp. 448–453. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Payer, U., Teufl, P., Lamberger, M.: Hybrid engine for polymorphic code detection. In: Julisch, K., Krügel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 19–31. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. SunbeltSoftware (Cwsandbox - automatic behavior analysis of malware)

    Google Scholar 

  5. Norman, Norman sandbox: A virtual environment where programs may perform in safe surroundings

    Google Scholar 

  6. Vasudevan, A., Yerraballi, R.: Cobra: Fine-grained malware analysis using stealth localized-executions. In: IEEE Symposium on Security and Privacy, pp. 264–279 (2006)

    Google Scholar 

  7. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: Bitblaze: A new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Microsoft, Natural language processing group: Redmond-based natural language processing group

    Google Scholar 

  9. Stanford, Natural language processing group: Natural language processing group at stanford university

    Google Scholar 

  10. Alias-i, Lingpipe: A suite of java libraries for the linguistic analysis of human language

    Google Scholar 

  11. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I.: Word sense disambiguation with spreading activation networks generated from thesauri. In: Veloso, M.M. (ed.) IJCAI 2007 (2007)

    Google Scholar 

  12. Quillian, M.R.: Semantic memory. MIT Press, Cambridge (1968)

    Google Scholar 

  13. Crestani, F.: Application of spreading activation techniques in information retrieval. Artificial Intelligence Review 11, 453–482 (1997)

    Article  Google Scholar 

  14. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)

    Google Scholar 

  15. Kozima, H.: Similarity between words computed by spreading activation on an english dictionary. In: EACL, pp. 232–239 (1993)

    Google Scholar 

  16. Qin, A.K., Suganthan, P.N.: Robust growing neural gas algorithm with application in cluster analysis. Neural Netw. 17, 1135–1148 (2004)

    Article  MATH  Google Scholar 

  17. Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems (NIPS), vol. 15, pp. 3–10. MIT Press, Cambridge (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Teufl, P., Payer, U., Lackner, G. (2010). From NLP (Natural Language Processing) to MLP (Machine Language Processing). In: Kotenko, I., Skormin, V. (eds) Computer Network Security. MMM-ACNS 2010. Lecture Notes in Computer Science, vol 6258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14706-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14706-7_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14705-0

  • Online ISBN: 978-3-642-14706-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics