Advertisement

Feature Selection Based on Fuzzy Conditional Distinction Degree

  • Qilai Zhang
  • Jianhua DaiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)

Abstract

Previous studies have shown that information entropy and its variants are useful at reducing data dimensionality. Yet, most existing approaches based on entropy exploit the correlations between features and labels, lacking of taking into account the relevance between features. In this paper, we propose a new index for feature selection, named fuzzy conditional distinction degree (FDD), based on fuzzy similarity relation by combining feature correlations with the relationship between features and labels. Different from existing approaches based on entropy, FDD considers the cardinality of the relation matrix instead of the similarity classes. Meanwhile, we encode the feature correlations into distance to measure the relevance of any two features. Some useful properties are discussed. Based on the FDD, a greedy forward algorithm for feature selection is presented. Experimental results on benchmark data sets denote the feasibility and effectiveness of the proposed approach.

Keywords

Feature selection Fuzzy distinction degree Dimension reduction 

Notes

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Nos. 61473259, 61502335, 61070074, 60703038) and the Hunan Provincial Science and Technology Project Foundation (2018TP1018, 2018RS3065).

References

  1. 1.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRefGoogle Scholar
  2. 2.
    Dai, J., Wang, W., Xu, Q.: An uncertainty measure for incomplete decision tables and its applications. IEEE Trans. Cybern. 43(4), 1277–1289 (2013)CrossRefGoogle Scholar
  3. 3.
    Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. J. 13(1), 211–221 (2013)CrossRefGoogle Scholar
  4. 4.
    Dai, J., Xu, Q., Wang, W., Tian, H.: Conditional entropy for incomplete decision systems and its application in data mining. Int. J. Gen. Syst. 41(7), 713–728 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 359–366 (2000)Google Scholar
  7. 7.
    Hu, Q., Yu, D., Xie, Z., Liu, J.: Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans. Fuzzy Syst. 14(2), 191–201 (2006)CrossRefGoogle Scholar
  8. 8.
    Hu, Q., Zhang, L., Zhang, D., Pan, W., An, S., Pedrycz, W.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)CrossRefGoogle Scholar
  9. 9.
    Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17(4), 824–838 (2009)CrossRefGoogle Scholar
  10. 10.
    Tallón-Ballesteros, A.J., Riquelme, J.C.: Tackling ant colony optimization meta-heuristic as search method in feature subset selection based on correlation or consistency measures. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 386–393. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10840-7_47CrossRefGoogle Scholar
  11. 11.
    Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 904–912 (2012)Google Scholar
  12. 12.
    Tiwari, A.K., Shreevastava, S., Som, T., Shukla, K.K.: Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction. Expert Syst. Appl. 101, 205–212 (2018)CrossRefGoogle Scholar
  13. 13.
    Wang, C., Hu, Q., Wang, X., Chen, D., Qian, Y., Dong, Z.: Feature selection based on neighborhood discrimination index. IEEE Trans. Neural Netw. Learn. Syst. 29(7), 2986–2999 (2017)MathSciNetGoogle Scholar
  14. 14.
    Wang, C., Qi, Y., Shao, M., Hu, Q., Chen, D., Qian, Y., Lin, Y.: A fitting model for feature selection with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 25(4), 741–753 (2017)CrossRefGoogle Scholar
  15. 15.
    Wang, C., Shao, M., He, Q., Qian, Y., Qi, Y.: Feature subset selection based on fuzzy neighborhood rough sets. Knowl.-Based Syst. 111, 173–179 (2016)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H., Eibe, F., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Elsevier, Burlington (2011)zbMATHGoogle Scholar
  17. 17.
    Yager, R.R.: Entropy measures under similarity relations. Int. J. Gen. Syst. 20(4), 341–358 (1992)CrossRefGoogle Scholar
  18. 18.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21–24, 2003, Washington, DC, pp. 856–863 (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and Technology, Tianjin UniversityTianjinChina
  2. 2.Hunan Provincial Key Laboratory of Intelligent Computing and Language Information ProcessingCollege of Information Science and Engineering, Hunan Normal UniversityChangshaChina

Personalised recommendations