Abstract
Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Almgren, M., Jonsson, E.: Using active learning in intrusion detection. In: Proc. IEEE Computer Security Foundation Workshop (2004)
Angiulli, F.: Condensed nearest neighbor data domain description. In: Advances in Intelligent Data Analysis VI (2005)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the International Workshop on AI and Statistics (2005)
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, Czech Republic, May 2004, pp. 1–22 (2004)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results (2008), http://www.pascal-network.org/challenges/VOC/voc2008/
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of USENIX Security Symposium (2006)
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proc. of IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)
Hoi, C.-H., Chan, C.-H., Huang, K., Lyu, M., King, I.: Support vector machines for class representation and discrimination. In: Proceedings of the International Joint Conference on Neural Networks (2003)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases (1998)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, vol. 2, pp. 2169–2178 (2006)
Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information Systems Security 3, 227–261 (2000)
Liu, Y., Zheng, Y.F.: Minimum enclosing and maximum excluding machine for pattern description and discrimination. In: ICPR 2006: Proceedings of the 18th International Conference on Pattern Recognition, Washington, DC, USA, 2006, pp. 129–132. IEEE Computer Society Press, Los Alamitos (2006)
Lowe, D.: Distinctive image features from scale invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Mahoney, M.V., Chan, P.K.: Learning nonstationary models of normal network traffic for detecting novel attacks. In: Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 376–385 (2002)
Mahoney, M.V., Chan, P.K.: Learning rules for anomaly detection of hostile network traffic. In: Proc. of International Conference on Data Mining (ICDM) (2003)
Maynor, K., Mookhey, K., Cervini, J.F.R., Beaver, K.: Metasploit toolkit. Syngress (2007)
Pelleg, D., Moore, A.: Active learning for anomaly and rare-category detection. In: Proc. Advances in Neural Information Processing Systems (2004)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning (1999)
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)
Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. Journal in Computer Virology 2(4), 243–256 (2007)
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Stokes, J.W., Platt, J.C.: Aladin: Active learning of anomalies to detect intrusion. Technical report, Microsoft Research (2008)
Tax, D.M.J.: One-class classification. PhD thesis, Technical University Delft (2001)
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learning 54, 45–66 (2004)
Thottan, M., Ji, C.: Anomaly detection in ip networks. IEEE Transactions on Signal Processing 51(8), 2191–2204 (2003)
Wang, J., Neskovic, P., Cooper, L.N.: Pattern classification via single spheres. In: Computer Science: Discovery Science, DS (2005)
Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A content anomaly detector resistant to mimicry attack. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 226–248. Springer, Heidelberg (2006)
Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)
Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., Lemmen, C.: Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences 43(2), 667–673 (2003)
yan Yeung, D., Chow, C.: Parzen-window network intrusion detectors. In: Proceedings of the Sixteenth International Conference on Pattern Recognition, pp. 385–388 (2002)
Yuan, C., Casasent, D.: Pseudo relevance feedback with biased support vector machine. In: Proceedings of the International Joint Conference on Neural Networks (2004)
Zhu, X.: Semi–supervised learning in literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)
Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Görnitz, N., Kloft, M., Brefeld, U. (2009). Active and Semi-supervised Data Domain Description. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)