Skip to main content

Detecting Unknown Network Attacks Using Language Models

  • Conference paper
Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4064))

Abstract

We propose a method for network intrusion detection based on language models such as n-grams and words. Our method proceeds by extracting these models from TCP connection payloads and applying unsupervised anomaly detection. The essential part of our approach is linear-time computation of similarity measures between language models stored in trie data structures.

Results of our experiments conducted on two datasets of network traffic demonstrate the importance of higher-order n-grams for detection of unknown network attacks. Our method is also suitable for language models based on words, which are more amenable in practical security applications. An implementation of our system achieved detection accuracy of over 80% with no false positives on instances of recent attacks in HTTP, FTP and SMTP traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shannon, C., Moore, D.: The spread of the Witty worm. In: Proc. IEEE Symposium on Security and Privacy, vol. 2(4), pp. 46–50 (2004)

    Google Scholar 

  2. CERT: Advisory CA-2001-21: Buffer overflow in telnetd. CERT Coordination Center (2001)

    Google Scholar 

  3. Rubin, S., Jha, S., Miller, B.: Language-based generation and evaluation of NIDS signatures. In: Proc. IEEE Symposium on Security and Privacy, pp. 3–17 (2005)

    Google Scholar 

  4. Liang, Z., Sekar, R.: Automatic generation of buffer overflow attack signatures: An approach based on program behavior models. In: Srikanthan, T., Xue, J., Chang, C.-H. (eds.) ACSAC 2005. LNCS, vol. 3740. Springer, Heidelberg (2005)

    Google Scholar 

  5. Krügel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Meier, M.: A Model for the Semantics of Attack Signatures in Misuse Detection Systems. In: Zhang, K., Zheng, Y. (eds.) ISC 2004. LNCS, vol. 3225, pp. 158–169. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Eckmann, S., Vigna, G., Kemmerer, R.: STATL: An attack language for state-based intrusion detection. Journal of Computer Security 10(1/2), 71–104 (2002)

    Google Scholar 

  8. Paxson, V.: Bro: a system for detecting network intruders in real-time. In: Proc. USENIX, pp. 31–51 (1998)

    Google Scholar 

  9. Wang, K., Cretu, G., Stolfo, S.J.: Anomalous Payload-Based Worm Detection and Signature Generation. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 227–246. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Wang, K., Stolfo, S.J.: Anomalous Payload-Based Network Intrusion Detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Kruegel, C., Toth, T., Kirda, E.: Service specific anomaly detection for network intrusion detection. In: Proc. Symposium on Applied Computing, pp. 201–208 (2002)

    Google Scholar 

  12. Mahoney, M., Chan, P.: An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 220–237. Springer, Heidelberg (2004)

    Google Scholar 

  13. Mahoney, M., Chan, P.: PHAD: Packet header anomaly detection for identifying hostile network traffic. Technical Report CS-2001-2, Florida Institute of Technology (2001)

    Google Scholar 

  14. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)

    Google Scholar 

  15. Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security 3, 227–261 (2001)

    Article  Google Scholar 

  16. Mahoney, M., Chan, P.: Learning models of network traffic for detecting novel attacks. Technical Report CS-2002-8, Florida Institute of Technology (2002)

    Google Scholar 

  17. Mahoney, M.: Network traffic anomaly detection based on packet bytes. In: Proc. ACM Symposium on Applied Computing, pp. 346–350 (2003)

    Google Scholar 

  18. Vargiya, R., Chan, P.: Boundary detection in tokenizing netwok application payload for anomaly detection. In: Proc. ICDM Workshop on Data Mining for Computer Security, pp. 50–59 (2003)

    Google Scholar 

  19. Forrest, S., Hofmeyr, S., Somayaji, A., Longstaff, T.: A sense of self for unix processes. In: Proc. IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)

    Google Scholar 

  20. Hofmeyr, S., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. Journal of Computer Security 6(3), 151–180 (1998)

    Google Scholar 

  21. Warrender, C., Forrest, S., Perlmutter, B.: Detecting intrusions using system calls: alternative data models. In: Proc. IEEE Symposium on Security and Privacy, pp. 133–145 (1999)

    Google Scholar 

  22. Marceau, C.: Characterizing the behavior of a program using multiple-length n-grams. In: Proc. NSPW, pp. 101–110 (2000)

    Google Scholar 

  23. Ghosh, A., Schwartzbard, A., Schatz, M.: Learning program behavior profiles for intrusion detection. In: Proc. USENIX, Santa Clara, CA, USA, pp. 51–62 (1999)

    Google Scholar 

  24. Eskin, E., Lee, W., Stolfo, S.: Modeling system calls for intrusion detection with dynamic window sizes. In: Proc. DISCEX (2001)

    Google Scholar 

  25. Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)

    Article  Google Scholar 

  26. de la Briandais, R.: File searching using variable length keys. In: Proc. AFIPS Western Joint Computer Conference, pp. 295–298 (1959)

    Google Scholar 

  27. Fredkin, E.: Trie memory. Communications of ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  28. Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)

    Google Scholar 

  29. Emran, S., Ye, N.: Robustness of canberra metric in computer intrusion detection. In: Proc. IEEE Workshop on Information Assurance and Security, West Point, NY, USA (2001)

    Google Scholar 

  30. Dice, L.: Measure of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  31. Sokal, R., Sneath, P.: Principles of numerical taxonomy. Freeman, San Francisco (1963)

    Google Scholar 

  32. Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proc. ACM CSS Workshop on Data Mining Applied to Security (2001)

    Google Scholar 

  33. Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM (2003)

    Google Scholar 

  34. Laskov, P., Schäfer, C., Kotenko, I.: Intrusion detection in unlabeled data with quarter-sphere support vector machines. In: Proc. DIMVA, pp. 71–82 (2004)

    Google Scholar 

  35. Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)

    Article  Google Scholar 

  36. McHugh, J.: The 1998 Lincoln Laboratory IDS Evaluation. In: Debar, H., Mé, L., Wu, S.F. (eds.) RAID 2000. LNCS, vol. 1907, pp. 145–161. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  37. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. on Information Systems Security 3(4), 262–294 (2000)

    Article  Google Scholar 

  38. Moore, H.D.: The metasploit project – open-source platform for developing, testing, and using exploit code (2005), http://www.metasploit.com

  39. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  40. Roesch, M.: Snort: Lightweight intrusion detection for networks. In: Proc. LISA, pp. 229–238 (1999)

    Google Scholar 

  41. Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Analysis and Machine Intelligence 22(1), 36–62 (2000)

    Article  Google Scholar 

  42. Suen, C.Y.: N-gram statistics for natural language understanding and text processing. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 164–172 (1979)

    Article  Google Scholar 

  43. Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proc. SDAIR, Las Vegas, NV, USA, pp. 161–175 (1994)

    Google Scholar 

  44. Robertson, A.M., Willett, P.: Applications of n-grams in textual information systems. Journal of Documentation 58(1), 48–69 (1998)

    Article  Google Scholar 

  45. Watkins, C.: Dynamic alignment kernels. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)

    Google Scholar 

  46. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)

    Google Scholar 

  47. Lee, W., Stolfo, S., Chan, P.: Learning patterns from unix process execution traces for intrusion detection. In: Proc. AAAI workshop on Fraud Detection and Risk Management, Providence, RI, USA, pp. 50–56 (1997)

    Google Scholar 

  48. Michael, C.: Finding the vocabulary of program behavior data for anomaly detection. In: Proc. DISCEX, pp. 152–163 (2003)

    Google Scholar 

  49. Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)

    MathSciNet  Google Scholar 

  50. Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)

    MATH  Google Scholar 

  51. Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., Müller, K.R.: From outliers to prototypes: ordering data. Neurocomputing (in press, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rieck, K., Laskov, P. (2006). Detecting Unknown Network Attacks Using Language Models. In: Büschkes, R., Laskov, P. (eds) Detection of Intrusions and Malware & Vulnerability Assessment. DIMVA 2006. Lecture Notes in Computer Science, vol 4064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790754_5

Download citation

  • DOI: https://doi.org/10.1007/11790754_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36014-8

  • Online ISBN: 978-3-540-36017-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics