Abstract
We propose a method for network intrusion detection based on language models such as n-grams and words. Our method proceeds by extracting these models from TCP connection payloads and applying unsupervised anomaly detection. The essential part of our approach is linear-time computation of similarity measures between language models stored in trie data structures.
Results of our experiments conducted on two datasets of network traffic demonstrate the importance of higher-order n-grams for detection of unknown network attacks. Our method is also suitable for language models based on words, which are more amenable in practical security applications. An implementation of our system achieved detection accuracy of over 80% with no false positives on instances of recent attacks in HTTP, FTP and SMTP traffic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Shannon, C., Moore, D.: The spread of the Witty worm. In: Proc. IEEE Symposium on Security and Privacy, vol. 2(4), pp. 46–50 (2004)
CERT: Advisory CA-2001-21: Buffer overflow in telnetd. CERT Coordination Center (2001)
Rubin, S., Jha, S., Miller, B.: Language-based generation and evaluation of NIDS signatures. In: Proc. IEEE Symposium on Security and Privacy, pp. 3–17 (2005)
Liang, Z., Sekar, R.: Automatic generation of buffer overflow attack signatures: An approach based on program behavior models. In: Srikanthan, T., Xue, J., Chang, C.-H. (eds.) ACSAC 2005. LNCS, vol. 3740. Springer, Heidelberg (2005)
Krügel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Meier, M.: A Model for the Semantics of Attack Signatures in Misuse Detection Systems. In: Zhang, K., Zheng, Y. (eds.) ISC 2004. LNCS, vol. 3225, pp. 158–169. Springer, Heidelberg (2004)
Eckmann, S., Vigna, G., Kemmerer, R.: STATL: An attack language for state-based intrusion detection. Journal of Computer Security 10(1/2), 71–104 (2002)
Paxson, V.: Bro: a system for detecting network intruders in real-time. In: Proc. USENIX, pp. 31–51 (1998)
Wang, K., Cretu, G., Stolfo, S.J.: Anomalous Payload-Based Worm Detection and Signature Generation. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 227–246. Springer, Heidelberg (2006)
Wang, K., Stolfo, S.J.: Anomalous Payload-Based Network Intrusion Detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)
Kruegel, C., Toth, T., Kirda, E.: Service specific anomaly detection for network intrusion detection. In: Proc. Symposium on Applied Computing, pp. 201–208 (2002)
Mahoney, M., Chan, P.: An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 220–237. Springer, Heidelberg (2004)
Mahoney, M., Chan, P.: PHAD: Packet header anomaly detection for identifying hostile network traffic. Technical Report CS-2001-2, Florida Institute of Technology (2001)
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)
Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security 3, 227–261 (2001)
Mahoney, M., Chan, P.: Learning models of network traffic for detecting novel attacks. Technical Report CS-2002-8, Florida Institute of Technology (2002)
Mahoney, M.: Network traffic anomaly detection based on packet bytes. In: Proc. ACM Symposium on Applied Computing, pp. 346–350 (2003)
Vargiya, R., Chan, P.: Boundary detection in tokenizing netwok application payload for anomaly detection. In: Proc. ICDM Workshop on Data Mining for Computer Security, pp. 50–59 (2003)
Forrest, S., Hofmeyr, S., Somayaji, A., Longstaff, T.: A sense of self for unix processes. In: Proc. IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)
Hofmeyr, S., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. Journal of Computer Security 6(3), 151–180 (1998)
Warrender, C., Forrest, S., Perlmutter, B.: Detecting intrusions using system calls: alternative data models. In: Proc. IEEE Symposium on Security and Privacy, pp. 133–145 (1999)
Marceau, C.: Characterizing the behavior of a program using multiple-length n-grams. In: Proc. NSPW, pp. 101–110 (2000)
Ghosh, A., Schwartzbard, A., Schatz, M.: Learning program behavior profiles for intrusion detection. In: Proc. USENIX, Santa Clara, CA, USA, pp. 51–62 (1999)
Eskin, E., Lee, W., Stolfo, S.: Modeling system calls for intrusion detection with dynamic window sizes. In: Proc. DISCEX (2001)
Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)
de la Briandais, R.: File searching using variable length keys. In: Proc. AFIPS Western Joint Computer Conference, pp. 295–298 (1959)
Fredkin, E.: Trie memory. Communications of ACM 3(9), 490–499 (1960)
Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)
Emran, S., Ye, N.: Robustness of canberra metric in computer intrusion detection. In: Proc. IEEE Workshop on Information Assurance and Security, West Point, NY, USA (2001)
Dice, L.: Measure of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Sokal, R., Sneath, P.: Principles of numerical taxonomy. Freeman, San Francisco (1963)
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proc. ACM CSS Workshop on Data Mining Applied to Security (2001)
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM (2003)
Laskov, P., Schäfer, C., Kotenko, I.: Intrusion detection in unlabeled data with quarter-sphere support vector machines. In: Proc. DIMVA, pp. 71–82 (2004)
Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)
McHugh, J.: The 1998 Lincoln Laboratory IDS Evaluation. In: Debar, H., Mé, L., Wu, S.F. (eds.) RAID 2000. LNCS, vol. 1907, pp. 145–161. Springer, Heidelberg (2000)
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. on Information Systems Security 3(4), 262–294 (2000)
Moore, H.D.: The metasploit project – open-source platform for developing, testing, and using exploit code (2005), http://www.metasploit.com
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Roesch, M.: Snort: Lightweight intrusion detection for networks. In: Proc. LISA, pp. 229–238 (1999)
Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Analysis and Machine Intelligence 22(1), 36–62 (2000)
Suen, C.Y.: N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 164–172 (1979)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proc. SDAIR, Las Vegas, NV, USA, pp. 161–175 (1994)
Robertson, A.M., Willett, P.: Applications of n-grams in textual information systems. Journal of Documentation 58(1), 48–69 (1998)
Watkins, C.: Dynamic alignment kernels. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)
Lee, W., Stolfo, S., Chan, P.: Learning patterns from unix process execution traces for intrusion detection. In: Proc. AAAI workshop on Fraud Detection and Risk Management, Providence, RI, USA, pp. 50–56 (1997)
Michael, C.: Finding the vocabulary of program behavior data for anomaly detection. In: Proc. DISCEX, pp. 152–163 (2003)
Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., Müller, K.R.: From outliers to prototypes: ordering data. Neurocomputing (in press, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rieck, K., Laskov, P. (2006). Detecting Unknown Network Attacks Using Language Models. In: Büschkes, R., Laskov, P. (eds) Detection of Intrusions and Malware & Vulnerability Assessment. DIMVA 2006. Lecture Notes in Computer Science, vol 4064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790754_5
Download citation
DOI: https://doi.org/10.1007/11790754_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36014-8
Online ISBN: 978-3-540-36017-9
eBook Packages: Computer ScienceComputer Science (R0)