Advertisement

Active and Semi-supervised Data Domain Description

  • Nico Görnitz
  • Marius Kloft
  • Ulf Brefeld
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Keywords

Support Vector Machine Active Learning Unlabeled Data Object Recognition Task Domain Description 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Almgren, M., Jonsson, E.: Using active learning in intrusion detection. In: Proc. IEEE Computer Security Foundation Workshop (2004)Google Scholar
  2. 2.
    Angiulli, F.: Condensed nearest neighbor data domain description. In: Advances in Intelligent Data Analysis VI (2005)Google Scholar
  3. 3.
    Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the International Workshop on AI and Statistics (2005)Google Scholar
  4. 4.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, Czech Republic, May 2004, pp. 1–22 (2004)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results (2008), http://www.pascal-network.org/challenges/VOC/voc2008/
  6. 6.
    Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of USENIX Security Symposium (2006)Google Scholar
  7. 7.
    Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proc. of IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)Google Scholar
  8. 8.
    Hoi, C.-H., Chan, C.-H., Huang, K., Lyu, M., King, I.: Support vector machines for class representation and discrimination. In: Proceedings of the International Joint Conference on Neural Networks (2003)Google Scholar
  9. 9.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases (1998)Google Scholar
  10. 10.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, vol. 2, pp. 2169–2178 (2006)Google Scholar
  11. 11.
    Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information Systems Security 3, 227–261 (2000)CrossRefGoogle Scholar
  12. 12.
    Liu, Y., Zheng, Y.F.: Minimum enclosing and maximum excluding machine for pattern description and discrimination. In: ICPR 2006: Proceedings of the 18th International Conference on Pattern Recognition, Washington, DC, USA, 2006, pp. 129–132. IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  13. 13.
    Lowe, D.: Distinctive image features from scale invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  14. 14.
    Mahoney, M.V., Chan, P.K.: Learning nonstationary models of normal network traffic for detecting novel attacks. In: Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 376–385 (2002)Google Scholar
  15. 15.
    Mahoney, M.V., Chan, P.K.: Learning rules for anomaly detection of hostile network traffic. In: Proc. of International Conference on Data Mining (ICDM) (2003)Google Scholar
  16. 16.
    Maynor, K., Mookhey, K., Cervini, J.F.R., Beaver, K.: Metasploit toolkit. Syngress (2007)Google Scholar
  17. 17.
    Pelleg, D., Moore, A.: Active learning for anomaly and rare-category detection. In: Proc. Advances in Neural Information Processing Systems (2004)Google Scholar
  18. 18.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning (1999)Google Scholar
  19. 19.
    Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)Google Scholar
  20. 20.
    Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. Journal in Computer Virology 2(4), 243–256 (2007)CrossRefGoogle Scholar
  21. 21.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)zbMATHGoogle Scholar
  22. 22.
    Stokes, J.W., Platt, J.C.: Aladin: Active learning of anomalies to detect intrusion. Technical report, Microsoft Research (2008)Google Scholar
  23. 23.
    Tax, D.M.J.: One-class classification. PhD thesis, Technical University Delft (2001)Google Scholar
  24. 24.
    Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learning 54, 45–66 (2004)CrossRefzbMATHGoogle Scholar
  25. 25.
    Thottan, M., Ji, C.: Anomaly detection in ip networks. IEEE Transactions on Signal Processing 51(8), 2191–2204 (2003)CrossRefGoogle Scholar
  26. 26.
    Wang, J., Neskovic, P., Cooper, L.N.: Pattern classification via single spheres. In: Computer Science: Discovery Science, DS (2005)Google Scholar
  27. 27.
    Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A content anomaly detector resistant to mimicry attack. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 226–248. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  28. 28.
    Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  29. 29.
    Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., Lemmen, C.: Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences 43(2), 667–673 (2003)CrossRefGoogle Scholar
  30. 30.
    yan Yeung, D., Chow, C.: Parzen-window network intrusion detectors. In: Proceedings of the Sixteenth International Conference on Pattern Recognition, pp. 385–388 (2002)Google Scholar
  31. 31.
    Yuan, C., Casasent, D.: Pseudo relevance feedback with biased support vector machine. In: Proceedings of the International Joint Conference on Neural Networks (2004)Google Scholar
  32. 32.
    Zhu, X.: Semi–supervised learning in literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)Google Scholar
  33. 33.
    Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nico Görnitz
    • 1
  • Marius Kloft
    • 1
  • Ulf Brefeld
    • 1
  1. 1.Machine Learning GroupTechnische Universität BerlinBerlinGermany

Personalised recommendations