Applied Intelligence

, Volume 49, Issue 3, pp 883–896 | Cite as

Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy

  • HongFang ZhouEmail author
  • Yao Zhang
  • YingJie Zhang
  • HongJiang Liu


Feature selection is a process that selects some important features from original feature set. Many existing feature selection algorithms based on information theory concentrate on maximizing relevance and minimizing redundancy. In this paper, relevance and redundancy are extended to conditional relevance and conditional redundancy. Because of the natures of the two conditional relations, they tend to produce more accurate feature relations. A new frame integrating the two conditional relations is built in this paper and two new feature selection methods are proposed, which are Minimum Conditional Relevance-Minimum Conditional Redundancy (MCRMCR) and Minimum Conditional Relevance-Minimum Intra-Class Redundancy (MCRMICR) respectively. The proposed methods can select high class-relevance and low-redundancy features. Experimental results for twelve datasets verify the proposed methods perform better on feature selection and have high classification accuracy.


Feature selection Mutual information Conditional redundancy Intra-class redundancy 



The corresponding author would like to thank the support from the National Natural Science Foundation of China under the Grant of 61402363, the Education Department of Shaanxi Province Key Laboratory Project under the Grant of 15JS079, Xi’an Science Program Project under the Grant of 2017080CG/RC043(XALG017), the Ministry of Education of Shaanxi Province Research Project under the Grant of 17JK0534, and Beilin district of Xi’an Science and Technology Project under the Grant of GX1625.


  1. 1.
    Das S (2010) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the international conference on machine learning, pp 74-81Google Scholar
  2. 2.
    Zhou HF, Guo J, Wang Y (2016) A feature selection approach based on interclass and intraclass relative contributions of terms. Comput Intell Neurosci 2016(17):1–8Google Scholar
  3. 3.
    Zhou HF, Guo J, Wang YH (2016) A feature selection approach based on term distributions. SpringerPlus 5(1):1–14Google Scholar
  4. 4.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324zbMATHGoogle Scholar
  5. 5.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. JMLR 3(6):1157–1182zbMATHGoogle Scholar
  6. 6.
    Baranauskas JA, Netto SR (2017) A tree-based algorithm for attribute selection. Appl Intell 2017(19):1–13Google Scholar
  7. 7.
    Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125Google Scholar
  8. 8.
    Zhou HF, Zhao XH, Wang X (2014) An effective ensemble pruning algorithm based on frequent patterns. Knowl-Based Syst 56(3):79–85Google Scholar
  9. 9.
    Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of The workshop on speech and natural language, Association for computation linguistics Morristown, NJ, USA, pp 212–217Google Scholar
  10. 10.
    Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550Google Scholar
  11. 11.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238Google Scholar
  12. 12.
    Vinh LT, Lee S (2012) A novel selection method based on normalized mutual information. Appl Intell 37 (1):100–120Google Scholar
  13. 13.
    Lin D, Tang X (2006) Conditional infomax learning: An integrated framework for feature extraction and fusion. In: European conference on computer version. pp 68–82Google Scholar
  14. 14.
    Yang HH, Moody J (1999) Feature selection based on joint mutual information. In: Proceedings of International ICSC symposium on advances in intelligent data analysis. pp 22–25Google Scholar
  15. 15.
    Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555MathSciNetzbMATHGoogle Scholar
  16. 16.
    Brown G, Pocock A, Zhao MJ, Lujun M (2012) Conditional likelihood maximization: A unifying framework for information theoretic feature selection. J Mach Learn Res 13(1):27–66MathSciNetzbMATHGoogle Scholar
  17. 17.
    Chen ZJ, Wu CZ, Zhang YS, other (2015) Feature selection with redundancy-complementariness dispersion. Knowl-Based Syst 89(3):203–217Google Scholar
  18. 18.
    Wang J, Wei JM, Yang Z, other (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841Google Scholar
  19. 19.
    Vinh NX, Zhou S, Chan J, Bailey J (2015) Can high-order dependencies improve mutual information based feature selection Pattern Recogn 53(C):46–58Google Scholar
  20. 20.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  21. 21.
    Herman G, Zhang B, Wang Y, Ye G, Chen F (2013) Mutual information based method for selecting informative feature sets. Pattern Recogn 46(12):3315–3327Google Scholar
  22. 22.
    Zhou HF, Zhang YH, Liu YB (2017) A global-relationship dissimilarity measure for the k-modes clustering algorithm. Comput Intell Neurosci 2017:1–7Google Scholar
  23. 23.
    Li J, Cheng K, Morstatter S (2016) Feature selection: a data perspective. ACM Comput Surv 50 (6):94:1–94:45Google Scholar
  24. 24.
    Zhou HF, Li J, Li J, other (2017) A graph clustering method for community detection in complex networks. Physica A Statistical Mechanics & Its Applications 469:551–562Google Scholar
  25. 25.
    Zheng Y, Kwoh CK (2011) A feature subset selection method based on high-dimensional mutual information. Entropy 13(4):860–901Google Scholar
  26. 26.
    Chow TWS, Huang D (2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224Google Scholar
  27. 27.
    Zhou HF, Liu J, Li J, Duan WC (2017) A density-based approach for detecting complexes in weighted PPI networks by semantic similarity. Plos One 12(7):1–14Google Scholar
  28. 28.
    Vinh NX, Chan J, Bailey J (2014) Reconsidering mutual information based feature selection: A statistical significance view. In: Proceedings of the 80th AAAI conference on artificial intelligence, pp 2092–2098Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • HongFang Zhou
    • 1
    • 2
    Email author
  • Yao Zhang
    • 1
  • YingJie Zhang
    • 1
  • HongJiang Liu
    • 1
  1. 1.School of Computer Science and EngineeringXi’an University of TechnologyXi’anChina
  2. 2.Shaanxi Key Laboratory of Network Computing and Security TechnologyXi’anChina

Personalised recommendations