An Unsupervised Feature Selection Framework Based on Clustering
Feature selection plays an important part in improving the quality of learning algorithms in machine learning and data mining. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. In this work, a clustering-based framework formed by an unsupervised feature selection algorithm is proposed. The proposed framework is mainly concerned with the problem of determining and choosing important features, which are selected by ranking the features according to the importance measure scores, from the original feature set without class information. Theory analyzed indicates that the time complexity of each algorithm is nearly linear with the size and the number of features of dataset. Experimental results on UCI datasets show that algorithm with different scores in the framework are able to identify the important features with clustering, and the proposed algorithm have obtained competitive results in terms of classification error rate and the degree of dimensionality reduction when compared with the state-of-the-art supervised and unsupervised feature selection approaches.
KeywordsFeature Selection Unsupervised Learning Feature Importance Measure Score Clustering
Unable to display preview. Download preview PDF.
- 1.Asuncion, A., Newman, D. J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
- 3.Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
- 5.Dash, M., Liu, H., Yao, J.: Dimensionality Reduction of Unsupervised Data. Newport Beach. In: Proc 9th IEEE Int’l Conf. Tools with Artificial Intelligence, pp. 532–539 (1997)Google Scholar
- 9.Jiang, S.Y., Li, X., Zheng, Q., et al.: Approximate Equal Frequency Discretization Method. In: GCIS, vol. 5, pp. 514–518 (2009)Google Scholar
- 11.Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and a New Algorithm. In: Proceedings of AAAI 1992, San Jose, CA, pp. 129–134 (1992)Google Scholar
- 15.Mingers, J.: An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3, 19–342 (1989)Google Scholar
- 18.Singh, S., Murthy, H., Gonsalves, T.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. In: 4th Workshop on Feature Selection in Data Mining, pp. 76–85 (2010)Google Scholar
- 24.Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
- 26.Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning, Hamilton, New Zealand (1998)Google Scholar