Skip to main content

Taming the Curse of Dimensionality in Kernels and Novelty Detection

  • Conference paper

Part of the book series: Advances in Soft Computing ((AINSC,volume 34))

Abstract

The curse of dimensionality is a well known but not entirely wellunderstood phenomena. Too much data, in terms of the number of input variables, is not always a good thing. This is especially true when the problem involves unsupervised learning or supervised learning with unbalanced data (many negative observations but minimal positive observations). This paper addresses two issues involving high dimensional data: The first issue explores the behavior of kernels in high dimensional data. It is shown that variance, especially when contributed by meaningless noisy variables, confounds learning methods. The second part of this paper illustrates methods to overcome dimensionality problems with unsupervised learning utilizing subspace models. The modeling approach involves novelty detection with the one-class SVM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charu C. Aggarwal and Philip S. Yu. Outlier Detection for High Dimensional Data. Santa Barbara, California, 2001. Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data.

    Google Scholar 

  2. Kristin P. Bennett and Colin Campbell. Support Vector Machines: Hype or Hallelujah. 2(2), 2001.

    Google Scholar 

  3. Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When Is “Nearest Neighbor” Meaningful? Lecture Notes in Computer Science, 1540:217–235, 1999.

    Article  Google Scholar 

  4. Piero Bonissone, Kai Goebel, and Weizhong Yan. Classifier Fusion using Triangular Norms. Cagliari, Italy, June 2004. Proceedings of Multiple Classifier Systems (MCS) 2004.

    Google Scholar 

  5. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  6. Leo Breiman. Random forests. Machine Learning, 45(l):5–32, 2001.

    Article  MATH  Google Scholar 

  7. Chih Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines. http://www.scie.ntu.edu.tw/ cjlin/libsvm, Accessed 5 September, 2004.

    Google Scholar 

  8. Yunqiang Chen, Xiang Zhou, and Thomas S. Huang. One-Class SVM for Learning in Image Retrieval. Thessaloniki, Greece, 2001. Proceedings of IEEE International Conference on Image Processing.

    Google Scholar 

  9. William DuMouchel, Wen Hua Ju, Alan F. Karr, Matthius Schonlau, Martin Theus, and Yehuda Vardi. Computer Intrusion: Detecting Masquerades. Statistical Science, 16(1):1–17, 2001.

    MATH  MathSciNet  Google Scholar 

  10. Paul F. Evangelista, Piero Bonissone, Mark J. Embrechts, and Boleslaw K. Szymanski. Fuzzy ROC Curves for the One Class SVM: Application to Intrusion Detection. Montreal, Canada, August 2005. International Joint Conference on Neural Networks.

    Google Scholar 

  11. Paul F. Evangelista, Piero Bonissone, Mark J. Embrechts, and Boleslaw K. Szymanski. Unsupervised Fuzzy Ensembles and Their Use in Intrusion Detection. Bruges, Belgium, April 2005. European Symposium on Artificial Neural Networks.

    Google Scholar 

  12. Andrew G. Glen, Lawrence M. Leemis, and John H. Drew. Computing the Distribution of the Product of Two Continuous Random Variables. Computational Statistics and Data Analysis, 44(3):451–464, 2004.

    Article  MathSciNet  Google Scholar 

  13. Tin Kam Ho. The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.

    Article  Google Scholar 

  14. Alexander Hofmann, Timo Horeis, and Bernhard Sick. Feature Selection for Intrusion Detection: An Evolutionary Wrapper Approach. Budapest, Hungary, July 2004. International Joint Conference on Neural Networks.

    Google Scholar 

  15. Mario Koppen. The Curse of Dimensionality. (held on the internet), September 4–18 2000. 5th Online World Conference on Soft Computing in Industrial Applications (WSC5).

    Google Scholar 

  16. Ludmila I. Kuncheva. ‘Fuzzy’ vs. ‘Non-fuzzy’ in Combining Classifiers Designed by Boosting. IEEE Transactions on Fuzzy Systems, 11(3):729–741, 2003.

    Article  Google Scholar 

  17. Ludmila I. Kuncheva. That Elusive Diversity in Classifier Ensembles. Mallorca, Spain, 2003. Proceedings of 1st Iberian Conference on Pattern Recognition and Image Analysis.

    Google Scholar 

  18. Ludmila I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons, Inc., 2004.

    Google Scholar 

  19. Ludmila I. Kuncheva and C.J. Whitaker. Measures of Diversity in Classifier Ensembles. Machine Learning, 51:181–207, 2003.

    Article  MATH  Google Scholar 

  20. Junshui Ma and Simon Perkins. Time-series Novelty Detection Using One-class Support Vector Machines. Portland, Oregon, July 2003. International Joint Conference on Neural Networks.

    Google Scholar 

  21. Lance Parsons, Ehtesham Haque, and Huan Liu. Subspace Clustering for High Dimensional Data: A Review. SIGKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004.

    Google Scholar 

  22. Vijay K. Rohatgi and A.K.Md. Ehsanes Saleh. An Introduction to Probability and Statistics. Wiley, second edition, 2001.

    Google Scholar 

  23. Bernhard Scholkopf, John C. Platt, John Shawe Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the Support of a High Dimensional Distribution. Neural Computation, 13:1443–1471, 2001.

    Article  Google Scholar 

  24. John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

    Google Scholar 

  25. Salvatore Stolfo and Ke Wang. One Class Training for Masquerade Detection. Florida, 19 November 2003. 3rd IEEE Conference Data Mining Workshop on Data Mining for Computer Security.

    Google Scholar 

  26. Alexander Strehl and Joydeep Ghosh. Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 3:583–617, December 2002.

    Article  MathSciNet  Google Scholar 

  27. David M.J. Tax and Robert P.W. Duin. Support Vector Domain Description. Pattern Recognition Letters, 20:1191–1199, 1999.

    Article  Google Scholar 

  28. Jiong Yang, Wei Wang, Haixun Wang, and Philip Yu. δ-clusters: Capturing Subspace Correlation in a Large Data Set. pages 517–528. 18th International Conference on Data Engineering, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this paper

Cite this paper

Evangelista, P.F., Embrechts, M.J., Szymanski, B.K. (2006). Taming the Curse of Dimensionality in Kernels and Novelty Detection. In: Abraham, A., de Baets, B., Köppen, M., Nickolay, B. (eds) Applied Soft Computing Technologies: The Challenge of Complexity. Advances in Soft Computing, vol 34. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31662-0_33

Download citation

  • DOI: https://doi.org/10.1007/3-540-31662-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31649-7

  • Online ISBN: 978-3-540-31662-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics