Skip to main content

Content-Based Methodology for Anomaly Detection on the Web

  • Conference paper
  • First Online:
Advances in Web Intelligence (AWIC 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2663))

Included in the following conference series:

Abstract

As became apparent after the tragic events of September 11, 2001, terrorist organizations and other criminal groups are increasingly using the legitimate ways of Internet access to conduct their malicious activities. Such actions cannot be detected by existing intrusion detection systems that are generally aimed at protecting computer systems and networks from some kind of “cyber attacks”. Preparation of an attack against the human society itself can only be detected through analysis of the content accessed by the users. The proposed study aims at developing an innovative methodology for abnormal activity detection, which uses web content as the audit information provided to the detection system. The new behavior-based detection method learns the normal behavior by applying an unsupervised clustering algorithm to the contents of publicly available web pages viewed by a group of similar users. In this paper, we represent page content by the well-known vector space model. The content models of normal behavior are used in real-time to reveal deviation from normal behavior at a specific location on the net. The detection algorithm sensitivity is controlled by a threshold parameter. The method is evaluated by the trade-off between the detection rate (TP) and the false positive rate (FP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Debar, M. Dacier, A. Wespi, “Towards a taxonomy of intrusion-detection systems”, Computer Networks, 1999, Vol. 31, pp. 805–822.

    Article  Google Scholar 

  2. W. Lee, S.J. Stolfo, P. K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, J. Zhang, “Real Time Data Mining-based Intrusion Detection”, Proceedings of DISCEX II, 2001.

    Google Scholar 

  3. W. Lee, S.J. Stolfo, “A Framework for Constructing Features and Models for Intrusion Detection Systems”, ACM Transactions on Information and System Security, 2000, Vol. 3, No. 4.

    Google Scholar 

  4. W. Lee, S.J. Stolfo, “Data Mining Approaches for Intrusion Detection”, In Proceedings of the Seventh USENIX Security Symposium, San Antonio, TX, 1998.

    Google Scholar 

  5. K. Richards, “Network Based Intrusion Detection: A Review of Technologies”, Computers & Security, 1999, Vol. 18, pp. 671–682.

    Article  Google Scholar 

  6. E.H. Spafford, D. Zamboni, “Intrusion detection using autonomous agents”, Computer Networks, 2000, Vol. 4, pp. 547–570.

    Article  Google Scholar 

  7. J.S. Balasubramaniyan, J.O. Garcia-Fernandez, D Isacoff, E. Spafford, D. Zamboni, “An architecture for intrusion detection using autonomous agents”, Proceedings 14th Annual Computer Security Applications Conference, IEEE Comput. Soc, Los Alamitos, CA, USA, 1998, xiii+365, pp. 13–24.

    Google Scholar 

  8. J. Cannady, “Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks”, Proceedings of the 23rd National Information Systems Security Conference, 2000.

    Google Scholar 

  9. J. Cannady, “Neural Networks for Misuse Detection: Initial Results”, Proceedings of the Recent Advances in Intrusion Detection’ 98 Conference, 1998, pp. 31–47.

    Google Scholar 

  10. B. Balajinath, S. Raghavan, “Intrusion detection through learning behavior model”, International Journal Of Computer Communications, 2001, Vol. 24, No. 12, pp. 1202–1212.

    Article  Google Scholar 

  11. G. White, V. Pooch, “Cooperating Security Managers: distribute intrusion detection systems”, Computers & Security, 1996, Vol. 15, No. 5, pp. 441–450.

    Article  Google Scholar 

  12. M. Y. Huang, R.J. Jasper, T.M. Wicks, “A large scale distributed intrusion detection framework based on attack strategy analysis”, Computer Networks, 1999, Vol. 31, pp. 2465–2475.

    Article  Google Scholar 

  13. P. Ning, X.S. Wang, S. Jajodia, “Modeling requests among cooperating intrusion detection systems”, Computer Communications, 2000, Vol. 23, pp. 1702–1715.

    Article  Google Scholar 

  14. J. Cannady, “Applying CMAC-based on-line learning to intrusion detection”, In Proceedings of the International Joint Conference on Neural Networks, Italy, 2000, Vol. 5, pp. 405–410.

    Google Scholar 

  15. V. Paxson, “Bro: a system for detecting network intruders in real-time”, Computer Networks, 1999, Vol. 31, pp. 2435–2463.

    Article  Google Scholar 

  16. B.C. Rhodes, J.A. Mahaffey, J.D. Cannady, “Multiple Self-Organizing Maps for Intrusion Detection”, 23rd National Information Systems Security Conference, 2000.

    Google Scholar 

  17. E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo, “A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data”, Data Mining in Security Applications, Kluwer Academic Publishers, 2002.

    Google Scholar 

  18. R.P. Lippmann, R.K. Cunningham, “Improving intrusion detection performance using keyword selection and neural networks”, Computer Networks, 2000, Vol. 34, pp. 597–603.

    Article  Google Scholar 

  19. J.A. Marin, D. Ragsdale, J. Surdu, “A hybrid approach to the profile creation and intrusion detection”, Proceedings DARPA Information Survivability Conference and Exposition II, IEEE Comput. Soc, CA, USA, 2001, Vol. 1, pp. 69–76.

    Chapter  Google Scholar 

  20. T. Fawcett, F. Provost, “Activity Monitoring: Noticing interesting changes in behavior”, Proceedings on the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999.

    Google Scholar 

  21. Z. Boger, T. Kuflik, P. Shoval, B. Shapira, “Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering systems”, Information Processing and Management, 2001, Vol. 37, pp. 187–198.

    Article  MATH  Google Scholar 

  22. E. Bloedorn, I. Mani, “Using NLP for Machine Learning of User Profiles”, Intelligent Data Analysis, 1998, Vol. 2, pp. 3–18.

    Article  Google Scholar 

  23. S. Pierrea, C. Kacanb, W. Probstc, “An agent-based approach for integrating user profile into a knowledge management process”, 2000, Knowledge-Based Systems, Vol. 13, pp. 307–314.

    Article  Google Scholar 

  24. B. Shapira, P. Shoval, U. Hanani, “Stereotypes in Information Filtering Systems”, Information Processing & Management, 1997, Vol. 33, No. 3, pp. 273–287.

    Article  Google Scholar 

  25. B. Shapira, P. Shoval, U. Hanani, “Experimentation with an information filtering system that combines cognitive and sociological filtering integrated with user stereotypes”, Decision Support Systems, 1999, Vol. 27, pp. 5–24.

    Article  Google Scholar 

  26. D. Hand, H. Mannila, P. Smyth, “Principles of Data Mining”, MIt Press, England, 2001.

    Google Scholar 

  27. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, 1996, Vol. 17, No. 3, pp. 37–54.

    Google Scholar 

  28. A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, 1999, Vol. 31, No. 3, pp. 264–323.

    Article  Google Scholar 

  29. A. Schenker, M. Last, H. Bunke, and A. Kandel, “Clustering of Web Documents using a Graph Model”, to appear in “Web Document Analysis: Challenges and Opportunities”, Apostolos Antonacopoulos and Jianying Hu (Editors), World Scientific, 2003.

    Google Scholar 

  30. G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer (Addison-Wesley, Reading, 1989).

    Google Scholar 

  31. S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (Prentice-Hall, Upper Saddle River, 1995).

    MATH  Google Scholar 

  32. X. Lu, Document retrieval: a structural approach, Information Processing and Management 26, 2 (1990) 209–218.

    Google Scholar 

  33. Han, J. and Kamber, M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.

    Google Scholar 

  34. K. Sequeira and M. Zaki, “ADMIT: Anomaly-based Data Mining for Intrusions”, Proceeding of SIGKDD 02, pp. 386–395, ACM, 2002.

    Google Scholar 

  35. Salton, G., Wong, A., and Yang C.S.A.: Vector Space Model for Automatic Indexing. Communications of the ACM 18, 613–620, 1975

    Article  MATH  Google Scholar 

  36. R. Lemos, “What are the real risks of cyberterrorism?”, ZDNet, August 26, 2002, URL: http://zdnet.com.com/2100-1105-955293.html.

  37. George Karypis, CLUTO — A Clustering Toolkit, Release 2.0, University of Minnesota, 2002 [http://www-users.cs.umn.edu/~karypis/cluto/download.html].

  38. U. Hanani, B. Shapira and P. Shoval, “Information Filtering: Overview of Issues, Research and Systems”, User Modeling and User-Adapted Interaction (UMUAI), Vol. 11(3), 203–259, 2001.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Last, M., Shapira, B., Elovici, Y., Zaafrany, O., Kandel, A. (2003). Content-Based Methodology for Anomaly Detection on the Web. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds) Advances in Web Intelligence. AWIC 2003. Lecture Notes in Computer Science, vol 2663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44831-4_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-44831-4_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40124-7

  • Online ISBN: 978-3-540-44831-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics