Finding the Optimal Number of Features Based on Mutual Information

  • Peipei ChenEmail author
  • Anna Wilbik
  • Saskia van Loon
  • Arjen-Kars Boer
  • Uzay Kaymak
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 641)


For high dimensional data analytics, feature selection is an indispensable preprocessing step to reduce dimensionality and keep the simplicity and interpretability of models. This is particularly important for fuzzy modeling since fuzzy models are widely recognized for their transparency and interpretability. Despite the substantial work on feature selection, there is little research on determining the optimal number of features for a task. In this paper, we propose a method to help find the optimal number of feature effectively based on mutual information.


Feature selection Mutual information Number of features Fuzzy models 



This work is partially supported by Philips Research within the scope of the BrainBridge Program.


  1. 1.
    Alonso, J.M., Castiello, C., Mencar, C.: Interpretability of fuzzy systems: current research trends and prospects. In: Springer Handbook of Computational Intelligence, pp. 219–237. Springer, Berlin (2015)Google Scholar
  2. 2.
    Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2014)Google Scholar
  3. 3.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRefGoogle Scholar
  4. 4.
    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)CrossRefGoogle Scholar
  5. 5.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  6. 6.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)Google Scholar
  7. 7.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Gaspar-Cunha, A., Recio, G., Costa, L., Estébanez, C.: Self-adaptive MOEA feature selection for classification of bankruptcy prediction data. Sci. World J. 2014, 20 (2014)Google Scholar
  9. 9.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)Google Scholar
  10. 10.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction: Foundations and Applications, vol. 207. Springer, Heidelberg (2008)Google Scholar
  11. 11.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  12. 12.
    Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)Google Scholar
  13. 13.
    Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning (1998)Google Scholar
  14. 14.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  15. 15.
    Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theor. 14(1), 55–63 (1968)CrossRefGoogle Scholar
  16. 16.
    Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-Fuzzy and Soft Computing, a Computational Approach to Learning and Machine Intelligence. Prentice Hall, Upper Saddle River (1997)Google Scholar
  17. 17.
    Kaymak, U., Ben-David, A., Potharst, R.: The AUK: a simple alternative to the AUC. Eng. Appl. Artif. Intell. 25(5), 1082–1089 (2012)CrossRefGoogle Scholar
  18. 18.
    Khan, A., Baig, A.R.: Multi-objective feature subset selection using non-dominated sorting genetic algorithm. J. Appl. Res. Technol. 13(1), 145–159 (2015)CrossRefGoogle Scholar
  19. 19.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  20. 20.
    Pohjalainen, J., Räsänen, O., Kadioglu, S.: Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)CrossRefGoogle Scholar
  21. 21.
    Setnes, M., Kaymak, U.: Fuzzy modeling of client preference from large data sets: an application to target selection in direct marketing. IEEE Trans. Fuzzy Syst. 9(1), 153–163 (2001)CrossRefGoogle Scholar
  22. 22.
    Wilbik, A., van Loon, S., Boer, A.K., Kaymak, U., Scharnhorst, V.: Fuzzy modeling for vitamin b12 deficiency. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 462–471. Springer (2016)Google Scholar
  23. 23.
    Xue, B., Fu, W., Zhang, M.: Multi-objective feature selection in classification: a differential evolution approach. In: Asia-Pacific Conference on Simulated Evolution and Learning, pp. 516–528. Springer (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Peipei Chen
    • 1
    • 2
    Email author
  • Anna Wilbik
    • 2
  • Saskia van Loon
    • 3
  • Arjen-Kars Boer
    • 3
  • Uzay Kaymak
    • 1
    • 2
  1. 1.College of Biomedical Engineering and Instrument ScienceZhejiang UniversityHangzhouChina
  2. 2.Information Systems, School of Industrial EngineeringEindhoven University of TechnologyEindhovenThe Netherlands
  3. 3.Clinical ChemistryCatharina HospitalEindhovenThe Netherlands

Personalised recommendations