Active and Semi-supervised Data Domain Description

Görnitz, Nico; Kloft, Marius; Brefeld, Ulf

doi:10.1007/978-3-642-04180-8_44

Nico Görnitz²²,
Marius Kloft²² &
Ulf Brefeld²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2664 Accesses
13 Citations

Abstract

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Download to read the full chapter text

Chapter PDF

SVM-SVDD: A New Method to Solve Data Description Problem with Negative Examples

Efficient SVDD sampling with approximation guarantees for the decision boundary

Article Open access 07 April 2022

Fast and Memory-Efficient Import Vector Domain Description

Article 26 May 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Almgren, M., Jonsson, E.: Using active learning in intrusion detection. In: Proc. IEEE Computer Security Foundation Workshop (2004)
Google Scholar
Angiulli, F.: Condensed nearest neighbor data domain description. In: Advances in Intelligent Data Analysis VI (2005)
Google Scholar
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the International Workshop on AI and Statistics (2005)
Google Scholar
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, Czech Republic, May 2004, pp. 1–22 (2004)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results (2008), http://www.pascal-network.org/challenges/VOC/voc2008/
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of USENIX Security Symposium (2006)
Google Scholar
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proc. of IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 120–128 (1996)
Google Scholar
Hoi, C.-H., Chan, C.-H., Huang, K., Lyu, M., King, I.: Support vector machines for class representation and discrimination. In: Proceedings of the International Joint Conference on Neural Networks (2003)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases (1998)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, vol. 2, pp. 2169–2178 (2006)
Google Scholar
Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information Systems Security 3, 227–261 (2000)
Article Google Scholar
Liu, Y., Zheng, Y.F.: Minimum enclosing and maximum excluding machine for pattern description and discrimination. In: ICPR 2006: Proceedings of the 18th International Conference on Pattern Recognition, Washington, DC, USA, 2006, pp. 129–132. IEEE Computer Society Press, Los Alamitos (2006)
Google Scholar
Lowe, D.: Distinctive image features from scale invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Mahoney, M.V., Chan, P.K.: Learning nonstationary models of normal network traffic for detecting novel attacks. In: Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 376–385 (2002)
Google Scholar
Mahoney, M.V., Chan, P.K.: Learning rules for anomaly detection of hostile network traffic. In: Proc. of International Conference on Data Mining (ICDM) (2003)
Google Scholar
Maynor, K., Mookhey, K., Cervini, J.F.R., Beaver, K.: Metasploit toolkit. Syngress (2007)
Google Scholar
Pelleg, D., Moore, A.: Active learning for anomaly and rare-category detection. In: Proc. Advances in Neural Information Processing Systems (2004)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning (1999)
Google Scholar
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)
Google Scholar
Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. Journal in Computer Virology 2(4), 243–256 (2007)
Article Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
MATH Google Scholar
Stokes, J.W., Platt, J.C.: Aladin: Active learning of anomalies to detect intrusion. Technical report, Microsoft Research (2008)
Google Scholar
Tax, D.M.J.: One-class classification. PhD thesis, Technical University Delft (2001)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learning 54, 45–66 (2004)
Article MATH Google Scholar
Thottan, M., Ji, C.: Anomaly detection in ip networks. IEEE Transactions on Signal Processing 51(8), 2191–2204 (2003)
Article Google Scholar
Wang, J., Neskovic, P., Cooper, L.N.: Pattern classification via single spheres. In: Computer Science: Discovery Science, DS (2005)
Google Scholar
Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A content anomaly detector resistant to mimicry attack. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 226–248. Springer, Heidelberg (2006)
Chapter Google Scholar
Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)
Chapter Google Scholar
Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., Lemmen, C.: Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences 43(2), 667–673 (2003)
Article Google Scholar
yan Yeung, D., Chow, C.: Parzen-window network intrusion detectors. In: Proceedings of the Sixteenth International Conference on Pattern Recognition, pp. 385–388 (2002)
Google Scholar
Yuan, C., Casasent, D.: Pseudo relevance feedback with biased support vector machine. In: Proceedings of the International Joint Conference on Neural Networks (2004)
Google Scholar
Zhu, X.: Semi–supervised learning in literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)
Google Scholar
Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning Group, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Nico Görnitz, Marius Kloft & Ulf Brefeld

Authors

Nico Görnitz
View author publications
You can also search for this author in PubMed Google Scholar
Marius Kloft
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Brefeld
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Görnitz, N., Kloft, M., Brefeld, U. (2009). Active and Semi-supervised Data Domain Description. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active and Semi-supervised Data Domain Description

Abstract

Chapter PDF

Similar content being viewed by others

SVM-SVDD: A New Method to Solve Data Description Problem with Negative Examples

Efficient SVDD sampling with approximation guarantees for the decision boundary

Fast and Memory-Efficient Import Vector Domain Description

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Active and Semi-supervised Data Domain Description

Abstract

Chapter PDF

Similar content being viewed by others

SVM-SVDD: A New Method to Solve Data Description Problem with Negative Examples

Efficient SVDD sampling with approximation guarantees for the decision boundary

Fast and Memory-Efficient Import Vector Domain Description

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation