Journal of Zhejiang University-SCIENCE A

, Volume 5, Issue 11, pp 1382–1391 | Cite as

Feature selection based on mutual information and redundancy-synergy coefficient

  • Sheng Yang
  • Jun Gu
Computer & Information Science


Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method.

Key words

Mutual information Feature selection Machine learning Data mining 

Document code

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Almuallim, H., Dietterich, T.G., 1991. Learning with Many Irrelevant Features. Proceedings of the Ninth National Conference on Artifical Intelligence (AAAI-91), Anaheim, California, p.547–552.Google Scholar
  2. Blum, A.L., Rivest, R.L., 1992. Training a 3-node neural network is NP-complete.Neural Networks,5:117–127.CrossRefGoogle Scholar
  3. Brenner, N., Strong, S.P., Koberle, R., Bialek, W., De Ruyter van Steveninck, R., 2000. Synergy in a neural code.Neural Computation,13(7):1531–1552.CrossRefGoogle Scholar
  4. Chen, B., Hong, J.R., Wang, Y.D., 1997. Minimum feature subset selection problem.Journal of Computer Science and Technology,12:145–153.MathSciNetCrossRefGoogle Scholar
  5. Cover, T.M., 1991. Elements of Information Theory: Wiley, New York.CrossRefMATHGoogle Scholar
  6. Fano, R., 1961. Tranmission of Information: A Statistical Theory of Communications. Wiley, New York.Google Scholar
  7. Liu, H., Motoda, H., 1998. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Acadcemic Press, Boston.CrossRefMATHGoogle Scholar
  8. Liu, H., Motoda, H., Dash, M., 1998. A Monotonic Measure for Optimal Feature Selection. Proceedings of ECML-98, P.101–106.Google Scholar
  9. Liu, H., Setiono, R., 1996. A Probabilistic Approach to Feature Selection—A Filter Solution.In: ICML-96. Morgan Kaufmann Publishers, p.319–327.Google Scholar
  10. Murphy, P.M., Pazzani, M.J., 1994. Exploring the decision forest: An empirical investigation of Occam’s razor in decision tree induction.Journal of Art. Intel.,1: 257–319.MATHGoogle Scholar
  11. Narendra, P., Fukunaga, K., 1977. A branch and bound method for feature subset selection.IEEE Trans. on Computer,26(9):917–922.CrossRefMATHGoogle Scholar
  12. Yaglom, A.M., Yaglom, I.M., 1983. Probability and Information. D. Reidel Publishing Company.Google Scholar

Copyright information

© Zhejiang University Press 2004

Authors and Affiliations

  1. 1.Institute of Image Processing & Pattern RecognitionShanghai Jiaotong UniversityShanghaiChina
  2. 2.Department of Computer ScienceHongkong University of Science and TechnologyHongkongChina

Personalised recommendations