Taming the Curse of Dimensionality in Kernels and Novelty Detection

Evangelista, Paul F.; Embrechts, Mark J.; Szymanski, Boleslaw K.

doi:10.1007/3-540-31662-0_33

Taming the Curse of Dimensionality in Kernels and Novelty Detection

Paul F. Evangelista⁶,
Mark J. Embrechts⁷ &
Boleslaw K. Szymanski⁸

Conference paper

Part of the book series: Advances in Soft Computing ((AINSC,volume 34))

Abstract

The curse of dimensionality is a well known but not entirely wellunderstood phenomena. Too much data, in terms of the number of input variables, is not always a good thing. This is especially true when the problem involves unsupervised learning or supervised learning with unbalanced data (many negative observations but minimal positive observations). This paper addresses two issues involving high dimensional data: The first issue explores the behavior of kernels in high dimensional data. It is shown that variance, especially when contributed by meaningless noisy variables, confounds learning methods. The second part of this paper illustrates methods to overcome dimensionality problems with unsupervised learning utilizing subspace models. The modeling approach involves novelty detection with the one-class SVM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Charu C. Aggarwal and Philip S. Yu. Outlier Detection for High Dimensional Data. Santa Barbara, California, 2001. Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data.
Google Scholar
Kristin P. Bennett and Colin Campbell. Support Vector Machines: Hype or Hallelujah. 2(2), 2001.
Google Scholar
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When Is “Nearest Neighbor” Meaningful? Lecture Notes in Computer Science, 1540:217–235, 1999.
Article Google Scholar
Piero Bonissone, Kai Goebel, and Weizhong Yan. Classifier Fusion using Triangular Norms. Cagliari, Italy, June 2004. Proceedings of Multiple Classifier Systems (MCS) 2004.
Google Scholar
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
MATH MathSciNet Google Scholar
Leo Breiman. Random forests. Machine Learning, 45(l):5–32, 2001.
Article MATH Google Scholar
Chih Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines. http://www.scie.ntu.edu.tw/ cjlin/libsvm, Accessed 5 September, 2004.
Google Scholar
Yunqiang Chen, Xiang Zhou, and Thomas S. Huang. One-Class SVM for Learning in Image Retrieval. Thessaloniki, Greece, 2001. Proceedings of IEEE International Conference on Image Processing.
Google Scholar
William DuMouchel, Wen Hua Ju, Alan F. Karr, Matthius Schonlau, Martin Theus, and Yehuda Vardi. Computer Intrusion: Detecting Masquerades. Statistical Science, 16(1):1–17, 2001.
MATH MathSciNet Google Scholar
Paul F. Evangelista, Piero Bonissone, Mark J. Embrechts, and Boleslaw K. Szymanski. Fuzzy ROC Curves for the One Class SVM: Application to Intrusion Detection. Montreal, Canada, August 2005. International Joint Conference on Neural Networks.
Google Scholar
Paul F. Evangelista, Piero Bonissone, Mark J. Embrechts, and Boleslaw K. Szymanski. Unsupervised Fuzzy Ensembles and Their Use in Intrusion Detection. Bruges, Belgium, April 2005. European Symposium on Artificial Neural Networks.
Google Scholar
Andrew G. Glen, Lawrence M. Leemis, and John H. Drew. Computing the Distribution of the Product of Two Continuous Random Variables. Computational Statistics and Data Analysis, 44(3):451–464, 2004.
Article MathSciNet Google Scholar
Tin Kam Ho. The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.
Article Google Scholar
Alexander Hofmann, Timo Horeis, and Bernhard Sick. Feature Selection for Intrusion Detection: An Evolutionary Wrapper Approach. Budapest, Hungary, July 2004. International Joint Conference on Neural Networks.
Google Scholar
Mario Koppen. The Curse of Dimensionality. (held on the internet), September 4–18 2000. 5th Online World Conference on Soft Computing in Industrial Applications (WSC5).
Google Scholar
Ludmila I. Kuncheva. ‘Fuzzy’ vs. ‘Non-fuzzy’ in Combining Classifiers Designed by Boosting. IEEE Transactions on Fuzzy Systems, 11(3):729–741, 2003.
Article Google Scholar
Ludmila I. Kuncheva. That Elusive Diversity in Classifier Ensembles. Mallorca, Spain, 2003. Proceedings of 1st Iberian Conference on Pattern Recognition and Image Analysis.
Google Scholar
Ludmila I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons, Inc., 2004.
Google Scholar
Ludmila I. Kuncheva and C.J. Whitaker. Measures of Diversity in Classifier Ensembles. Machine Learning, 51:181–207, 2003.
Article MATH Google Scholar
Junshui Ma and Simon Perkins. Time-series Novelty Detection Using One-class Support Vector Machines. Portland, Oregon, July 2003. International Joint Conference on Neural Networks.
Google Scholar
Lance Parsons, Ehtesham Haque, and Huan Liu. Subspace Clustering for High Dimensional Data: A Review. SIGKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004.
Google Scholar
Vijay K. Rohatgi and A.K.Md. Ehsanes Saleh. An Introduction to Probability and Statistics. Wiley, second edition, 2001.
Google Scholar
Bernhard Scholkopf, John C. Platt, John Shawe Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the Support of a High Dimensional Distribution. Neural Computation, 13:1443–1471, 2001.
Article Google Scholar
John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
Google Scholar
Salvatore Stolfo and Ke Wang. One Class Training for Masquerade Detection. Florida, 19 November 2003. 3rd IEEE Conference Data Mining Workshop on Data Mining for Computer Security.
Google Scholar
Alexander Strehl and Joydeep Ghosh. Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 3:583–617, December 2002.
Article MathSciNet Google Scholar
David M.J. Tax and Robert P.W. Duin. Support Vector Domain Description. Pattern Recognition Letters, 20:1191–1199, 1999.
Article Google Scholar
Jiong Yang, Wei Wang, Haixun Wang, and Philip Yu. δ-clusters: Capturing Subspace Correlation in a Large Data Set. pages 517–528. 18th International Conference on Data Engineering, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, United States Military Academy, West Point, NY, 10996
Paul F. Evangelista
Department of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, New York, 12180
Mark J. Embrechts
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, 12180
Boleslaw K. Szymanski

Authors

Paul F. Evangelista
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Embrechts
View author publications
You can also search for this author in PubMed Google Scholar
Boleslaw K. Szymanski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Chung-Ang University, Heukseok-dong 221, 156-756, Seoul, Korea
Ajith Abraham
Department of Applied Mathematics Biometrics and Process Control, University Gent, Coupure Links 653, 9000 Gent, Belgium
Bernard de Baets
Dept. Automation Technologies, Fraunhofer IPK Berlin, Pascalstr. 8-9, 10587, Berlin, Germany
Mario Köppen
Dept. Automation Technologies, Fraunhofer IPK Berlin, Pascalstr. 8-9, 10587, Berlin, Germany
Bertram Nickolay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Evangelista, P.F., Embrechts, M.J., Szymanski, B.K. (2006). Taming the Curse of Dimensionality in Kernels and Novelty Detection. In: Abraham, A., de Baets, B., Köppen, M., Nickolay, B. (eds) Applied Soft Computing Technologies: The Challenge of Complexity. Advances in Soft Computing, vol 34. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31662-0_33

Download citation

DOI: https://doi.org/10.1007/3-540-31662-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31649-7
Online ISBN: 978-3-540-31662-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics