An Unsupervised Feature Selection Framework Based on Clustering

  • Sheng-yi Jiang
  • Lian-xi Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)


Feature selection plays an important part in improving the quality of learning algorithms in machine learning and data mining. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. In this work, a clustering-based framework formed by an unsupervised feature selection algorithm is proposed. The proposed framework is mainly concerned with the problem of determining and choosing important features, which are selected by ranking the features according to the importance measure scores, from the original feature set without class information. Theory analyzed indicates that the time complexity of each algorithm is nearly linear with the size and the number of features of dataset. Experimental results on UCI datasets show that algorithm with different scores in the framework are able to identify the important features with clustering, and the proposed algorithm have obtained competitive results in terms of classification error rate and the degree of dimensionality reduction when compared with the state-of-the-art supervised and unsupervised feature selection approaches.


Feature Selection Unsupervised Learning Feature Importance Measure Score Clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asuncion, A., Newman, D. J.: UCI Machine Learning Repository (2007),
  2. 2.
    Au, W., Chan, K.C.C., Wong, A.K.C.: Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)CrossRefGoogle Scholar
  3. 3.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
  4. 4.
    Covões, T.F., Hruschka, E.R., de Castro, L.N., Santos, Á.M.: A Cluster-Based Feature Selection Approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 169–176. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Dash, M., Liu, H., Yao, J.: Dimensionality Reduction of Unsupervised Data. Newport Beach. In: Proc 9th IEEE Int’l Conf. Tools with Artificial Intelligence, pp. 532–539 (1997)Google Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  7. 7.
    Huang, J.Z., Ng, M.K., Rong, H.Q.: Automated Variable Weighting in k-Means Type Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)CrossRefGoogle Scholar
  8. 8.
    Jiang, S.Y., Song, X.Y.: A Clustering-based Method for Unsupervised Intrusion Detections. Pattern Recognition Letters 5, 802–810 (2006)CrossRefGoogle Scholar
  9. 9.
    Jiang, S.Y., Li, X., Zheng, Q., et al.: Approximate Equal Frequency Discretization Method. In: GCIS, vol. 5, pp. 514–518 (2009)Google Scholar
  10. 10.
    Sotoca, J., Pla, F.: Supervised Feature Selection by Clustering Using Conditional Mutual Information-based Distances. Pattern Recognition 43, 2068–2081 (2010)CrossRefzbMATHGoogle Scholar
  11. 11.
    Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and a New Algorithm. In: Proceedings of AAAI 1992, San Jose, CA, pp. 129–134 (1992)Google Scholar
  12. 12.
    Last, M., Kandel, A., Maimon, O.: Information-theoretic Algorithm for Feature Selection. Pattern Recognition Letters 22, 799–811 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17, 1–12 (2005)CrossRefGoogle Scholar
  14. 14.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454, pp. 121–135. kluwer Academic Publishers, Boston (1998)CrossRefzbMATHGoogle Scholar
  15. 15.
    Mingers, J.: An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3, 19–342 (1989)Google Scholar
  16. 16.
    Mitra, P., Murthy, C.A.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 301–312 (2002)CrossRefGoogle Scholar
  17. 17.
    Modha, D.S., Spangler, W.S.: Feature Weighting in k-means Clustering. Machine Learning 52, 217–237 (2003)CrossRefzbMATHGoogle Scholar
  18. 18.
    Singh, S., Murthy, H., Gonsalves, T.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. In: 4th Workshop on Feature Selection in Data Mining, pp. 76–85 (2010)Google Scholar
  19. 19.
    Wang, X.Z., Wang, Y.D.: Improving Fuzzy C-means Clustering Based on Feature-weight Learning. Pattern Recognition Letters 25, 1123–1132 (2004)CrossRefGoogle Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), zbMATHGoogle Scholar
  21. 21.
    Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Zhang, D., Chen, S., Zhou, Z.: Constraint score: A New Filter Method for Feature Selection with Pair-wise Constraints. Pattern Recognition 41, 1440–1451 (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    Zeng, H., Cheung, Y.: A New Feature Selection Method for Gaussian Mixture Clustering. Pattern Recognition 42, 243–250 (2009)CrossRefzbMATHGoogle Scholar
  24. 24.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
  25. 25.
    Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. Journal of Machine Learning Research 5, 845–889 (2004)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning, Hamilton, New Zealand (1998)Google Scholar
  27. 27.
    Hu, Q., Liu, J., Yu, D.: Mixed Feature Selection Based on Granulation and Approximation. Knowledge based Systems 21, 294–304 (2008)CrossRefGoogle Scholar
  28. 28.
    Hu, Q., Pedrycz, W., Yu, D.: Selecting Categorical and Continuous Features Based on Neighborhood Decision Error Minimization. IEEE Trans. on Systems, Man, and Cybernetics-Part B: Cybernetics 40, 137–150 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sheng-yi Jiang
    • 1
  • Lian-xi Wang
    • 1
  1. 1.School of InformaticsGuangdong University of Foreign StudiesGuangzhouChina

Personalised recommendations