Feature Selection Based on Fuzzy Conditional Distinction Degree
Previous studies have shown that information entropy and its variants are useful at reducing data dimensionality. Yet, most existing approaches based on entropy exploit the correlations between features and labels, lacking of taking into account the relevance between features. In this paper, we propose a new index for feature selection, named fuzzy conditional distinction degree (FDD), based on fuzzy similarity relation by combining feature correlations with the relationship between features and labels. Different from existing approaches based on entropy, FDD considers the cardinality of the relation matrix instead of the similarity classes. Meanwhile, we encode the feature correlations into distance to measure the relevance of any two features. Some useful properties are discussed. Based on the FDD, a greedy forward algorithm for feature selection is presented. Experimental results on benchmark data sets denote the feasibility and effectiveness of the proposed approach.
KeywordsFeature selection Fuzzy distinction degree Dimension reduction
This work was partially supported by the National Natural Science Foundation of China (Nos. 61473259, 61502335, 61070074, 60703038) and the Hunan Provincial Science and Technology Project Foundation (2018TP1018, 2018RS3065).
- 6.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 359–366 (2000)Google Scholar
- 10.Tallón-Ballesteros, A.J., Riquelme, J.C.: Tackling ant colony optimization meta-heuristic as search method in feature subset selection based on correlation or consistency measures. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 386–393. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10840-7_47CrossRefGoogle Scholar
- 11.Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 904–912 (2012)Google Scholar
- 18.Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21–24, 2003, Washington, DC, pp. 856–863 (2003)Google Scholar