Advertisement

Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content

  • Ozgur Yilmazel
  • Svetlana Symonenko
  • Niranjan Balasubramanian
  • Elizabeth D. Liddy
Part of the Integrated Series In Information Systems book series (ISIS, volume 18)

Experiments were conducted to test several hypotheses on methods for improving document categorization for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were compared to Natural Language Processing (NLP) based representations in both the typical and one-class categorization problems using the Support Vector Machine algorithm. Results from our Semantic Anomaly Monitoring (SAM) system show that the NLP features significantly improved classifier performance over the BOW approach both in terms of precision and recall, while using many fewer features. The oneclass algorithm using NLP features demonstrated robustness when tested on new domains.

Keywords

Support Vector Machine Natural Language Processing Document Vector Insider Threat Security Informatics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aleman-Meza, B., Burns, P., Eavenson, M., Palaniswami, D., & Sheth, A. P. (2005). An Ontological Approach to the Document Access Problem of Insider Threat. Intelligence and Security Informatics, IEEE International Conference on Intelligence and Security Informatics, ISI 2005, Atlanta, GA.Google Scholar
  2. Allan, J. (2002). Topic Detection and Tracking: Event-based Information Organization (1st ed. Vol. 12): Springer.Google Scholar
  3. Anderson, R. Research and Development Initiatives Focused on Preventing, Detecting, and Responding to Insider Misuse of Critical Defense Information Systems: Results of a Three-Day Workshop. (1999) http://www.rand.org/publications/CF/CF151/CF151.pdf
  4. Bengel, J., Gauch, S., Mittur, E., & Vijayaraghavan, R. (2004). ChatTrack: Chat Room Topic Detection Using Classification. Second NSF/NIJ Symposium on Intelligence and Security Informatics (ISI2004).Google Scholar
  5. Burgoon, J., Blair, J., Qin, T., & Nunamaker, J., Jr. (2003). Detecting Deception Through Linguistic Analysis. First NSF/NIJ Symposium on Intelligence and Security Informatics, Tucson, Arizona.Google Scholar
  6. CNLP Center for Natural Language Processing (CNLP). www.cnlp.org CNS Center for Nonproliferation Studies (CNS). http://cns.miis.edu/
  7. Datta, P. (1997). Characteristic Concept Representations. University of California, Irvine, Irvine, CA.Google Scholar
  8. Denis, F., Gilleron, R., & Tommasi, M. (2002). Text classification from positive and unlabeled examples. Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2002).Google Scholar
  9. Dumais, S., John, P., Heckerman, D., & Sahami, M. (1998). Inductive Learning Algorithms and Representations for Text Categorization. Seventh International Conference on Information and Knowledge Management, Bethesda, Maryland, United States.Google Scholar
  10. Gabrilovich, E., & Markovitch, S. (2005). Feature Generation for Text Categorization Using World Knowledge. 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK.Google Scholar
  11. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. A Practical Guide to Support Vector Classification.Google Scholar
  12. Kumaran, G., & Allan, J. (2004). Text Classification and Named Entities for New Event Detection.Google Scholar
  13. Liddy, E. D. (2001). Information Security and Sharing. Online Magazine.Google Scholar
  14. Liddy, E. D. (2003). Natural Language Processing. In Encyclopedia of Library and Information Science (2nd ed.). New York: Marcel Decker, Inc.Google Scholar
  15. Manevitz, L. M., & Yousef, M. (2001). Document classification via neural networks trained exclusively with positive examples: Department of Computer Science. University of Haifa.Google Scholar
  16. Manevitz, L. M., & Yousef, M. (2002). One-class SVMs for Document Classification. The Journal of Machine Learning Research, 2, 139-154.CrossRefGoogle Scholar
  17. Markou, M., & Singh, S. (2003). Novelty Detection: A Review Part 1: Statistical Approaches. Signal Processing, 83(12), 2481 - 2497.CrossRefGoogle Scholar
  18. Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin, 29, 665--675.CrossRefGoogle Scholar
  19. Raskin, V., Hempelmann, C., Triezenberg, K., & Nirenburg, S. (2001). Ontology in Information Security: a Useful Theoretical Foundation and Methodological Tool. 2001 Workshop on New Security Paradigms.Google Scholar
  20. Schneider, K.-M.(2004). Learning to Filter Junk E-Mail from Positive and Unlabeled Examples.Google Scholar
  21. Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1), 1-47.CrossRefGoogle Scholar
  22. Shanahan, J. G., & Roma, N. (2003). Boosting SupportVector Machines for Text Classification Through Parameter-Free Threshold Relaxation. The 12th International Conference on Information and Knowledge Management, New Orleans, LA, USA.Google Scholar
  23. Sreenath, D. V., Grosky, W. I., & Fotouhi, F. (2003). Emergent Semantics from Users' Browsing Paths. First NSF/NIJ Symposium on Intelligence and Security Informatics, Tucson, AZ, USA.Google Scholar
  24. Stolfo, S., Hershkop, S., Wang, K., Nimeskern, O., & Hu, C. (2003). Behavior Profiling of Email. First NSF/NIJ Symposium on Intelligence and Security Informatics., Tucson, AZ, USA.Google Scholar
  25. Twitchell, D. P., Forsgren, N., Wiers, K., Burgoon, J. K., & Nunamaker, J. F. (2005). Detecting Deception in Synchronous Computer-Mediated Communication Using Speech Act Profiling. Intelligence and Security Informatics, IEEE International Conference on Intelligence and Security Informatics, ISI 2005, Atlanta, GA.Google Scholar
  26. Twitchell, D. P., Nunamaker Jr., J. F., & Burgoon, J. K. (2004). Using Speech Act Profiling for Deception Detection. Second NSF/NIJ Symposium on Intelligence and Security Informatics (ISI2004), Tucson, AZ.Google Scholar
  27. Upadhyaya, S., Chinchani, R., & Kwiat, K. (2001). An Analytical Framework for Reasoning About Intrusions. 20th IEEE Symposium on Reliable Distributed Systems.Google Scholar
  28. Yilmazel, O. (2006). Empirical Selection of NLP-Driven Document Representations For Text Categorization. Syracuse University, Syracuse.Google Scholar
  29. Yilmazel, O., Symonenko, S., Liddy, E. D., & Balasubramanian, N. (2005). Improved Document Representation for Classification Tasks For The Intelligence Community (Forthcoming). AAAI, CA.Google Scholar
  30. Yu, H., Han, J., & Chen-Chuan Chang, K. (2004). PEBL: Web Page Classification without Negative Examples. IEEE Transactions on Knowledge and Data Engineering, 16(1).Google Scholar
  31. Zheng, R., Yi, O., Zan, H., & Hsinchun, C. (2003). Authorship Analysis in Cybercrime Investigation. First NSF/NIJ Symposium on Intelligence and Security Informatics, Tucson, AZ, USA.Google Scholar
  32. Zhou, L., Burgoon, J. K., & Twitchell, D. P. (2003). A Longitudinal Analysis of Language Behavior of Deception in E-mail. First NSF/NIJ Symposium on Intelligence and Security Informatics., Tucson, AZ, USA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Ozgur Yilmazel
    • 1
  • Svetlana Symonenko
    • 1
  • Niranjan Balasubramanian
    • 1
  • Elizabeth D. Liddy
    • 1
  1. 1.Center for Natural Language ProcessingSyracuse UniversityUSA

Personalised recommendations