Detecting Unknown Network Attacks Using Language Models

Rieck, Konrad; Laskov, Pavel

doi:10.1007/11790754_5

Konrad Rieck¹⁸ &
Pavel Laskov¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4064))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

1009 Accesses
34 Citations

Abstract

We propose a method for network intrusion detection based on language models such as n-grams and words. Our method proceeds by extracting these models from TCP connection payloads and applying unsupervised anomaly detection. The essential part of our approach is linear-time computation of similarity measures between language models stored in trie data structures.

Results of our experiments conducted on two datasets of network traffic demonstrate the importance of higher-order n-grams for detection of unknown network attacks. Our method is also suitable for language models based on words, which are more amenable in practical security applications. An implementation of our system achieved detection accuracy of over 80% with no false positives on instances of recent attacks in HTTP, FTP and SMTP traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shannon, C., Moore, D.: The spread of the Witty worm. In: Proc. IEEE Symposium on Security and Privacy, vol. 2(4), pp. 46–50 (2004)
Google Scholar
CERT: Advisory CA-2001-21: Buffer overflow in telnetd. CERT Coordination Center (2001)
Google Scholar
Rubin, S., Jha, S., Miller, B.: Language-based generation and evaluation of NIDS signatures. In: Proc. IEEE Symposium on Security and Privacy, pp. 3–17 (2005)
Google Scholar
Liang, Z., Sekar, R.: Automatic generation of buffer overflow attack signatures: An approach based on program behavior models. In: Srikanthan, T., Xue, J., Chang, C.-H. (eds.) ACSAC 2005. LNCS, vol. 3740. Springer, Heidelberg (2005)
Google Scholar
Krügel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Chapter Google Scholar
Meier, M.: A Model for the Semantics of Attack Signatures in Misuse Detection Systems. In: Zhang, K., Zheng, Y. (eds.) ISC 2004. LNCS, vol. 3225, pp. 158–169. Springer, Heidelberg (2004)
Chapter Google Scholar
Eckmann, S., Vigna, G., Kemmerer, R.: STATL: An attack language for state-based intrusion detection. Journal of Computer Security 10(1/2), 71–104 (2002)
Google Scholar
Paxson, V.: Bro: a system for detecting network intruders in real-time. In: Proc. USENIX, pp. 31–51 (1998)
Google Scholar
Wang, K., Cretu, G., Stolfo, S.J.: Anomalous Payload-Based Worm Detection and Signature Generation. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 227–246. Springer, Heidelberg (2006)
Chapter Google Scholar
Wang, K., Stolfo, S.J.: Anomalous Payload-Based Network Intrusion Detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)
Chapter Google Scholar
Kruegel, C., Toth, T., Kirda, E.: Service specific anomaly detection for network intrusion detection. In: Proc. Symposium on Applied Computing, pp. 201–208 (2002)
Google Scholar
Mahoney, M., Chan, P.: An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 220–237. Springer, Heidelberg (2004)
Google Scholar
Mahoney, M., Chan, P.: PHAD: Packet header anomaly detection for identifying hostile network traffic. Technical Report CS-2001-2, Florida Institute of Technology (2001)
Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)
Google Scholar
Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security 3, 227–261 (2001)
Article Google Scholar
Mahoney, M., Chan, P.: Learning models of network traffic for detecting novel attacks. Technical Report CS-2002-8, Florida Institute of Technology (2002)
Google Scholar
Mahoney, M.: Network traffic anomaly detection based on packet bytes. In: Proc. ACM Symposium on Applied Computing, pp. 346–350 (2003)
Google Scholar
Vargiya, R., Chan, P.: Boundary detection in tokenizing netwok application payload for anomaly detection. In: Proc. ICDM Workshop on Data Mining for Computer Security, pp. 50–59 (2003)
Google Scholar
Forrest, S., Hofmeyr, S., Somayaji, A., Longstaff, T.: A sense of self for unix processes. In: Proc. IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)
Google Scholar
Hofmeyr, S., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. Journal of Computer Security 6(3), 151–180 (1998)
Google Scholar
Warrender, C., Forrest, S., Perlmutter, B.: Detecting intrusions using system calls: alternative data models. In: Proc. IEEE Symposium on Security and Privacy, pp. 133–145 (1999)
Google Scholar
Marceau, C.: Characterizing the behavior of a program using multiple-length n-grams. In: Proc. NSPW, pp. 101–110 (2000)
Google Scholar
Ghosh, A., Schwartzbard, A., Schatz, M.: Learning program behavior profiles for intrusion detection. In: Proc. USENIX, Santa Clara, CA, USA, pp. 51–62 (1999)
Google Scholar
Eskin, E., Lee, W., Stolfo, S.: Modeling system calls for intrusion detection with dynamic window sizes. In: Proc. DISCEX (2001)
Google Scholar
Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)
Article Google Scholar
de la Briandais, R.: File searching using variable length keys. In: Proc. AFIPS Western Joint Computer Conference, pp. 295–298 (1959)
Google Scholar
Fredkin, E.: Trie memory. Communications of ACM 3(9), 490–499 (1960)
Article Google Scholar
Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)
Google Scholar
Emran, S., Ye, N.: Robustness of canberra metric in computer intrusion detection. In: Proc. IEEE Workshop on Information Assurance and Security, West Point, NY, USA (2001)
Google Scholar
Dice, L.: Measure of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Sokal, R., Sneath, P.: Principles of numerical taxonomy. Freeman, San Francisco (1963)
Google Scholar
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proc. ACM CSS Workshop on Data Mining Applied to Security (2001)
Google Scholar
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM (2003)
Google Scholar
Laskov, P., Schäfer, C., Kotenko, I.: Intrusion detection in unlabeled data with quarter-sphere support vector machines. In: Proc. DIMVA, pp. 71–82 (2004)
Google Scholar
Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)
Article Google Scholar
McHugh, J.: The 1998 Lincoln Laboratory IDS Evaluation. In: Debar, H., Mé, L., Wu, S.F. (eds.) RAID 2000. LNCS, vol. 1907, pp. 145–161. Springer, Heidelberg (2000)
Chapter Google Scholar
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. on Information Systems Security 3(4), 262–294 (2000)
Article Google Scholar
Moore, H.D.: The metasploit project – open-source platform for developing, testing, and using exploit code (2005), http://www.metasploit.com
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Article MATH Google Scholar
Roesch, M.: Snort: Lightweight intrusion detection for networks. In: Proc. LISA, pp. 229–238 (1999)
Google Scholar
Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Analysis and Machine Intelligence 22(1), 36–62 (2000)
Article Google Scholar
Suen, C.Y.: N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 164–172 (1979)
Article Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proc. SDAIR, Las Vegas, NV, USA, pp. 161–175 (1994)
Google Scholar
Robertson, A.M., Willett, P.: Applications of n-grams in textual information systems. Journal of Documentation 58(1), 48–69 (1998)
Article Google Scholar
Watkins, C.: Dynamic alignment kernels. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)
Google Scholar
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)
Google Scholar
Lee, W., Stolfo, S., Chan, P.: Learning patterns from unix process execution traces for intrusion detection. In: Proc. AAAI workshop on Fraud Detection and Risk Management, Providence, RI, USA, pp. 50–56 (1997)
Google Scholar
Michael, C.: Finding the vocabulary of program behavior data for anomaly detection. In: Proc. DISCEX, pp. 152–163 (2003)
Google Scholar
Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
MathSciNet Google Scholar
Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)
MATH Google Scholar
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., Müller, K.R.: From outliers to prototypes: ordering data. Neurocomputing (in press, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer-FIRST.IDA, Kekuléstr. 7, 12489, Berlin, Germany
Konrad Rieck & Pavel Laskov

Authors

Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Laskov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RWE AG, Opernplatz 1, 45128, Essen, Germany
Roland Büschkes
Wilhelm-Schickard-Institute for Computer Science, University of Tübingen, Tübingen, Germany
Pavel Laskov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rieck, K., Laskov, P. (2006). Detecting Unknown Network Attacks Using Language Models. In: Büschkes, R., Laskov, P. (eds) Detection of Intrusions and Malware & Vulnerability Assessment. DIMVA 2006. Lecture Notes in Computer Science, vol 4064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790754_5

Download citation

DOI: https://doi.org/10.1007/11790754_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36014-8
Online ISBN: 978-3-540-36017-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics