Skip to main content

A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information

  • Chapter
Information Theory and Statistical Learning

Feature selection aims to reduce the dimensionality of patterns for clas-sificatory analysis by selecting the most informative rather than irrelevant and/or redundant features. In this study, a hybrid genetic algorithm for feature selection is presented to combine the advantages of both wrappers and filters. Two stages of optimization are involved. The outer optimization stage completes the global search for the best subset of features in a wrapper way, in which the mutual information between the predictive labels of a trained classifier and the true classes serves as the fitness function for the genetic algorithm. The inner optimization performs the local search in a filter manner, in which an improved estimation of the conditional mutual information acts as an independent measure of feature ranking. This measure takes into account not only the relevance of the candidate feature to the output classes but also the redundancy to the features already selected. The inner and outer optimizations cooperate with each other and achieve the high global predictive accuracy as well as the high local search efficiency. Experimental results demonstrate both parsimonious feature selection and excellent classification accuracy of the method on a range of benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmad, A., Dey, L.: A feature selection technique for classificatory analysis. Pattern Recognit Lett 26(1), 43–56, (2005).

    Article  Google Scholar 

  2. Alon U., Barkai N., et al.: The colon microarray data set. http://microarray.princeton. edu/oncology/affydata/, 1999.

  3. Amaldi, E., Kann, V.: On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1C2), 237–260, (1998).

    Article  MATH  MathSciNet  Google Scholar 

  4. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550, (1994).

    Article  Google Scholar 

  5. Bhanu, B., Lin, Y.: Genetic algorithm based feature selection for target detection in SAR images. Image Vision Comput 21(7), 591– 608, (2003).

    Article  Google Scholar 

  6. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995).

    Google Scholar 

  7. Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc. of the 11th Internat. Conf. on Machine Learn., New Brunswick, NJ, USA, pp. 28–36, (1994).

    Google Scholar 

  8. Cohen, J.: A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1), 37–46, (1960).

    Article  Google Scholar 

  9. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York, (1991).

    MATH  Google Scholar 

  10. Dash, M., Liu, H.: Feature selection for classification. Intell Data Anal 1(3), 131–156, (1997).

    Article  Google Scholar 

  11. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif Intell 151(1–2), 155–176, (2003).

    Article  MATH  MathSciNet  Google Scholar 

  12. Doak, J.: An evaluation of feature selection methods and their application to computer security. CSE Technical Report 92-18, University of California at Davis, (1992).

    Google Scholar 

  13. Erdogmus, D., Principe, J.: Lower and upper bounds for misclassification probability based on Renyis information. J VLSI Signal Process Systems 37(2–3), 305–317, (2004).

    Article  MATH  Google Scholar 

  14. Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. Wiley, New York, (1961).

    Google Scholar 

  15. Feder M and Merhav N.: Relations between entropy and error probability. IEEE Trans Inform Theory, 40(1), 259–266, (1994)

    Article  MATH  MathSciNet  Google Scholar 

  16. Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2), 1134–1140, (1986).

    Article  MathSciNet  Google Scholar 

  17. Grall-Maes, E., Beauseroy, P.: Mutual information-based feature extraction on the time– frequency plane. IEEE Trans Signal Process 50(4), 779–790, (2002).

    Article  Google Scholar 

  18. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J Machine Learn Res 2003(3), 1157–1182, (2003).

    Article  Google Scholar 

  19. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learn 46(1–3), 389–422, (2002).

    Article  MATH  Google Scholar 

  20. Han, TS. and Verdu, S.: Generalizing the fano inequality. IEEE Trans Inform Theory 40(4), 1147–1157, (1994)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hellman, M.E., Raviv, J.: Probability of error, equivocation and the chernoff bound. IEEE Trans Inform Theory 16(4), 368–372, (1970).

    Article  MATH  MathSciNet  Google Scholar 

  22. Huang, D., Chow, T.W.S.: Effective feature selection scheme using mutual information. Neurocomputing 63(1), 325–343, (2005).

    Google Scholar 

  23. Oh, Il-Seok, Lee, Jin-Seon, Moon, Byung-Ro: Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Machine Intell 26(11), 1424–1437, (2004).

    Article  Google Scholar 

  24. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif Intell 97(1–2), 273–324, (1997).

    Article  MATH  Google Scholar 

  25. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. Internat. Conf. Machine Learn., Bari, Italy, pp. 284–292, (1996)

    Google Scholar 

  26. Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1), 25–41, (2000).

    Article  Google Scholar 

  27. Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans Neural Netw 13(1), 143–159, (2002).

    Article  Google Scholar 

  28. Last, M., Maimon, O.: A compact and accurate model for classification. IEEE Trans Knowledge Data Eng 16(2), 203–215, (2004).

    Article  Google Scholar 

  29. Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogni Lett 22(6–7), 799–811, (2001).

    Article  MATH  Google Scholar 

  30. Liu H., Motoda H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston (1998)

    MATH  Google Scholar 

  31. Merz, C.J., Murphy, P.M.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http:// www.ics.uci.edu/ mlearn/ MLRepository.html. Accessed 15 Jan 2006.

  32. Opitz, D.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence. AAAI Press, pp. 379–384, (1999).

    Google Scholar 

  33. Quinlan, J.R.: Induction of decision trees. Machine Learn 1(1), 81–106, (1986).

    Google Scholar 

  34. Quinlan, J.R.: Improved use of continuous attributes in C4.5. J Artif Intell Res 4(1), 77–90, (1996).

    MATH  Google Scholar 

  35. Rakotomamonjy, A.: Variable selection using SVM-based criteria. J. Machine Learn Res 2003(3), 1357–1370, (2003).

    Article  MathSciNet  Google Scholar 

  36. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans Evolut Comput 4(2), 164–171, (2000).

    Article  Google Scholar 

  37. Salcedo-Sanz, S., Camps-Valls, G., Perez-Cruz, F., Sepulveda-Sanchis, J., Bousono-Calzon, C.: Enhancing genetic feature selection through restricted search and Walsh analysis. IEEE Trans Syst Man Cybern C 34(4), 398–406, (2004).

    Article  Google Scholar 

  38. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, (1949).

    MATH  Google Scholar 

  39. Sindhwani, V., Rakshit, S., Deodhare, D., Erdogmus, D., Principe, J.C., Niyogi, P.: Feature selection in MLPs and SVMs based on maximum output information. IEEE Trans Neural Netw 15(4), 937–948, (2004).

    Article  Google Scholar 

  40. Somol, P., Pudil, P., Kittler, J.: Fast branch and bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Machine Intell 26(7), 900–912, (2004).

    Article  Google Scholar 

  41. Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Inform Fusion 6(1), 83–98, 2005).

    Article  Google Scholar 

  42. Vapnik, V.: Statistical Learning Theory. Wiley, New York, (1998).

    MATH  Google Scholar 

  43. Weston, J., Elisseff, A., Schoelkopf, A.B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J Machine Learn Res 2003(3), 1439–1461, (2003).

    Article  Google Scholar 

  44. Yang, J.H., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell Systems 13(2), 44–49, (1998).

    Article  Google Scholar 

  45. Zhu, F., Guan, S.: Feature selection for modular GA-based classification. Appl Soft Comput J 4(4), 381–393, (2004).

    Article  Google Scholar 

  46. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag 17(3), 73–83, (1996).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinjie Huang or Panxiang Rong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Huang, J., Rong, P. (2009). A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information. In: Emmert-Streib, F., Dehmer, M. (eds) Information Theory and Statistical Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84816-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-84816-7_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-84815-0

  • Online ISBN: 978-0-387-84816-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics