On the Robustness of Kernel-Based Clustering

  • Fabio A. González
  • David Bermeo
  • Laura Ramos
  • Olfa Nasraoui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7441)

Abstract

This paper evaluates the robustness of two types of unsupervised learning methods, which work in feature spaces induced by a kernel function, kernel k-means and kernel symmetric non-negative matrix factorization. The main hypothesis is that the use of non-linear kernels makes these clustering algorithms more robust to noise and outliers. The hypothesis is corroborated by applying kernel and non-kernel versions of the algorithms to data with different degrees of contamination with noisy data. The results show that the kernel versions of the clustering algorithms are indeed more robust, i.e. producing estimates with lower bias in the presence of noise.

Keywords

Cluster Algorithm Gaussian Kernel Kernel Method Robust Statistic Probabilistic Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cuesta-Albertos, J.A., Gordaliza, A., Matran, C.: Trimmed k-Means: An Attempt to Robustify Quantizers. The Annals of Statistics 25(2), 553–576 (1997)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5(2), 270–293 (1997)CrossRefGoogle Scholar
  3. 3.
    Ding, C., Li, T., Jordan, M.I.: Convex and Semi-Nonnegative Matrix Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(1), 45–55 (2010)CrossRefGoogle Scholar
  4. 4.
    Dolia, A., Harris, C., Shawetaylor, J., Titterington, D.: Kernel ellipsoidal trimming. Computational Statistics & Data Analysis 52(1), 309–324 (2007)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Advances in Data Analysis and Classification 4(2-3), 89–109 (2010)CrossRefGoogle Scholar
  6. 6.
    Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Computational Statistics & Data Analysis 44(4), 625–638 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Huber, P.J.: Robust Statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken (1981)MATHCrossRefGoogle Scholar
  8. 8.
    Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-Breakdown Robust Multivariate Methods. Statistical Science 23(1), 92–119 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kwok, J.T.Y., Tsang, I.W.H.: The pre-image problem in kernel methods, vol. 15, pp. 1517–1525. IEEE (2004)Google Scholar
  10. 10.
    Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust statistics. Wiley (2006)Google Scholar
  11. 11.
    Nasraoui, O., Krishnapuram, R.: A robust estimator based on density and scale optimization and its application to clustering. In: Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, vol. 2, pp. 1031–1035. IEEE (1996)Google Scholar
  12. 12.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fabio A. González
    • 1
  • David Bermeo
    • 1
  • Laura Ramos
    • 1
  • Olfa Nasraoui
    • 2
  1. 1.BioIngenium Research GroupUniversidad Nacional de ColombiaBogotáColombia
  2. 2.Knowledge Discovery & Web Mining LabThe University of LouisvilleUSA

Personalised recommendations