From NLP (Natural Language Processing) to MLP (Machine Language Processing)

Teufl, Peter; Payer, Udo; Lackner, Guenter

doi:10.1007/978-3-642-14706-7_20

Peter Teufl¹⁸,
Udo Payer¹⁹ &
Guenter Lackner²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 6258))

Included in the following conference series:

International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security

1435 Accesses
14 Citations

Abstract

Natural Language Processing (NLP) in combination with Machine Learning techniques plays an important role in the field of automatic text analysis. Motivated by the successful use of NLP in solving text classification problems in the area of e-Participation and inspired by our prior work in the field of polymorphic shellcode detection we gave classical NLP-processes a trial in the special case of malicious code analysis. Any malicious program is based on some kind of machine language, ranging from manually crafted assembler code that exploits a buffer overflow to high level languages such as Javascript used in web-based attacks. We argue that well known NLP analysis processes can be modified and applied to the malware analysis domain. Similar to the NLP process we call this process Machine Language Processing (MLP). In this paper, we use our e-Participation analysis architecture, extract the various NLP techniques and adopt them for the malware analysis process. As proof-of-concept we apply the adopted framework to malicious code examples from Metasploit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Teufl, P., Payer, U., Parycek, P.: Automated analysis of e-participation data by utilizing associative networks, spreading activation and unsupervised learning. In: Macintosh, A., Tambouris, E. (eds.) Electronic Participation. LNCS, vol. 5694, pp. 139–150. Springer, Heidelberg (2009)
Chapter Google Scholar
Payer, U., Teufl, P., Kraxberger, S., Lamberger, M.: Massive data mining for polymorphic code detection. In: Gorodetsky, V., Kotenko, I., Skormin, V.A. (eds.) MMM-ACNS 2005. LNCS, vol. 3685, pp. 448–453. Springer, Heidelberg (2005)
Chapter Google Scholar
Payer, U., Teufl, P., Lamberger, M.: Hybrid engine for polymorphic code detection. In: Julisch, K., Krügel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 19–31. Springer, Heidelberg (2005)
Chapter Google Scholar
SunbeltSoftware (Cwsandbox - automatic behavior analysis of malware)
Google Scholar
Norman, Norman sandbox: A virtual environment where programs may perform in safe surroundings
Google Scholar
Vasudevan, A., Yerraballi, R.: Cobra: Fine-grained malware analysis using stealth localized-executions. In: IEEE Symposium on Security and Privacy, pp. 264–279 (2006)
Google Scholar
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: Bitblaze: A new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)
Chapter Google Scholar
Microsoft, Natural language processing group: Redmond-based natural language processing group
Google Scholar
Stanford, Natural language processing group: Natural language processing group at stanford university
Google Scholar
Alias-i, Lingpipe: A suite of java libraries for the linguistic analysis of human language
Google Scholar
Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I.: Word sense disambiguation with spreading activation networks generated from thesauri. In: Veloso, M.M. (ed.) IJCAI 2007 (2007)
Google Scholar
Quillian, M.R.: Semantic memory. MIT Press, Cambridge (1968)
Google Scholar
Crestani, F.: Application of spreading activation techniques in information retrieval. Artificial Intelligence Review 11, 453–482 (1997)
Article Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
Google Scholar
Kozima, H.: Similarity between words computed by spreading activation on an english dictionary. In: EACL, pp. 232–239 (1993)
Google Scholar
Qin, A.K., Suganthan, P.N.: Robust growing neural gas algorithm with application in cluster analysis. Neural Netw. 17, 1135–1148 (2004)
Article MATH Google Scholar
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems (NIPS), vol. 15, pp. 3–10. MIT Press, Cambridge (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology,
Peter Teufl
CAMPUS02, Graz University of Applied Science,
Udo Payer
Studio78, Graz
Guenter Lackner

Authors

Peter Teufl
View author publications
You can also search for this author in PubMed Google Scholar
Udo Payer
View author publications
You can also search for this author in PubMed Google Scholar
Guenter Lackner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Security Research Group, St. Petersburg Institute for Informatics and Automation, 39, 14 Liniya, 199178, St.-Petersburg, Russia
Igor Kotenko
US Air Force, Binghamton University (SUNYI), 13902, Binghamton, NY, USA
Victor Skormin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teufl, P., Payer, U., Lackner, G. (2010). From NLP (Natural Language Processing) to MLP (Machine Language Processing). In: Kotenko, I., Skormin, V. (eds) Computer Network Security. MMM-ACNS 2010. Lecture Notes in Computer Science, vol 6258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14706-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-14706-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14705-0
Online ISBN: 978-3-642-14706-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics