PCA-Guided k-Means with Variable Weighting and Its Application to Document Clustering

  • Katsuhiro Honda
  • Akira Notsu
  • Hidetomo Ichihashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5861)


PCA-guided k-Means is a deterministic approach to k-Means clustering, in which cluster indicators are derived in a PCA-guided manner. This paper proposes a new approach to k-Means with variable selection by introducing variable weighting mechanism into PCA-guided k-Means. The relative responsibility of variables is estimated in a similar way with FCM clustering while the membership indicator is derived from a PCA-guided manner, in which the principal component scores are calculated by considering the responsibility weights of variables. So, the variables that have meaningful information for capturing cluster structures are emphasized in calculation of membership indicators. Numerical experiments including an application to document clustering demonstrate the characteristics of the proposed method.


Variable Selection Variable Weighting Principal Component Score Document Cluster Membership Indicator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ding, C., He, X.: K-means clustering via principal component analysis. In: Proc. of Int’l. Conf. Machine Learning (ICML 2004), pp. 225–232 (2004)Google Scholar
  2. 2.
    Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)CrossRefGoogle Scholar
  3. 3.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)zbMATHGoogle Scholar
  4. 4.
    MacQueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281–297 (1967)Google Scholar
  5. 5.
    Zha, H., Ding, C., Gu, M., He, X., Simon, H.: Spectral relaxation for K-means clustering. In: Advances in Neural Information Processing Systems 14 (Proc. of NIPS 2001), pp. 1057–1064 (2002)Google Scholar
  6. 6.
    Ding, C., He, X.: Linearized cluster assignment via spectral ordering. In: Proc. of Int’l. Conf. Machine Learning (ICML 2004), pp. 233–240 (2004)Google Scholar
  7. 7.
    Honda, K., Ichihashi, H., Masulli, F., Rovetta, S.: Linear fuzzy clustering with selection of variables using graded possibilistic approach. IEEE Trans. Fuzzy Systems 15(5), 878–889 (2007)CrossRefGoogle Scholar
  8. 8.
    Honda, K., Ichihashi, H.: Linear fuzzy clustering techniques with missing values and their application to local principal component analysis. IEEE Trans. Fuzzy Systems 12(2), 183–193 (2004)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Honda, K., Ichihashi, H.: Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans. Fuzzy Systems 13(4), 508–516 (2005)CrossRefGoogle Scholar
  10. 10.
    Jolliffe, I.T.: Discarding variables in a principal component analysis. I. Artificial data. Appl. Statist. 21, 160–173 (1972)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Tanaka, Y., Mori, Y.: Principal component analysis based on a subset of variables: variable selection and sensitivity analysis. American Journal of Mathematics and Management Sciences 17(1,2), 61–89 (1997)zbMATHMathSciNetGoogle Scholar
  12. 12.
    VASpca (VAriable Selection in Principal Component Analysis) Web Page,

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Katsuhiro Honda
    • 1
  • Akira Notsu
    • 1
  • Hidetomo Ichihashi
    • 1
  1. 1.Osaka prefecture UniversityOsakaJapan

Personalised recommendations