A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information

Huang, Jinjie; Rong, Panxiang

doi:10.1007/978-0-387-84816-7_6

Jinjie Huang⁵ &
Panxiang Rong⁵

4291 Accesses
1 Citations

Feature selection aims to reduce the dimensionality of patterns for clas-sificatory analysis by selecting the most informative rather than irrelevant and/or redundant features. In this study, a hybrid genetic algorithm for feature selection is presented to combine the advantages of both wrappers and filters. Two stages of optimization are involved. The outer optimization stage completes the global search for the best subset of features in a wrapper way, in which the mutual information between the predictive labels of a trained classifier and the true classes serves as the fitness function for the genetic algorithm. The inner optimization performs the local search in a filter manner, in which an improved estimation of the conditional mutual information acts as an independent measure of feature ranking. This measure takes into account not only the relevance of the candidate feature to the output classes but also the redundancy to the features already selected. The inner and outer optimizations cooperate with each other and achieve the high global predictive accuracy as well as the high local search efficiency. Experimental results demonstrate both parsimonious feature selection and excellent classification accuracy of the method on a range of benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmad, A., Dey, L.: A feature selection technique for classificatory analysis. Pattern Recognit Lett 26(1), 43–56, (2005).
Article Google Scholar
Alon U., Barkai N., et al.: The colon microarray data set. http://microarray.princeton. edu/oncology/affydata/, 1999.
Amaldi, E., Kann, V.: On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1C2), 237–260, (1998).
Article MATH MathSciNet Google Scholar
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550, (1994).
Article Google Scholar
Bhanu, B., Lin, Y.: Genetic algorithm based feature selection for target detection in SAR images. Image Vision Comput 21(7), 591– 608, (2003).
Article Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995).
Google Scholar
Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc. of the 11th Internat. Conf. on Machine Learn., New Brunswick, NJ, USA, pp. 28–36, (1994).
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1), 37–46, (1960).
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York, (1991).
MATH Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell Data Anal 1(3), 131–156, (1997).
Article Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif Intell 151(1–2), 155–176, (2003).
Article MATH MathSciNet Google Scholar
Doak, J.: An evaluation of feature selection methods and their application to computer security. CSE Technical Report 92-18, University of California at Davis, (1992).
Google Scholar
Erdogmus, D., Principe, J.: Lower and upper bounds for misclassification probability based on Renyis information. J VLSI Signal Process Systems 37(2–3), 305–317, (2004).
Article MATH Google Scholar
Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. Wiley, New York, (1961).
Google Scholar
Feder M and Merhav N.: Relations between entropy and error probability. IEEE Trans Inform Theory, 40(1), 259–266, (1994)
Article MATH MathSciNet Google Scholar
Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2), 1134–1140, (1986).
Article MathSciNet Google Scholar
Grall-Maes, E., Beauseroy, P.: Mutual information-based feature extraction on the time– frequency plane. IEEE Trans Signal Process 50(4), 779–790, (2002).
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J Machine Learn Res 2003(3), 1157–1182, (2003).
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learn 46(1–3), 389–422, (2002).
Article MATH Google Scholar
Han, TS. and Verdu, S.: Generalizing the fano inequality. IEEE Trans Inform Theory 40(4), 1147–1157, (1994)
Article MATH MathSciNet Google Scholar
Hellman, M.E., Raviv, J.: Probability of error, equivocation and the chernoff bound. IEEE Trans Inform Theory 16(4), 368–372, (1970).
Article MATH MathSciNet Google Scholar
Huang, D., Chow, T.W.S.: Effective feature selection scheme using mutual information. Neurocomputing 63(1), 325–343, (2005).
Google Scholar
Oh, Il-Seok, Lee, Jin-Seon, Moon, Byung-Ro: Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Machine Intell 26(11), 1424–1437, (2004).
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif Intell 97(1–2), 273–324, (1997).
Article MATH Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. Internat. Conf. Machine Learn., Bari, Italy, pp. 284–292, (1996)
Google Scholar
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1), 25–41, (2000).
Article Google Scholar
Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans Neural Netw 13(1), 143–159, (2002).
Article Google Scholar
Last, M., Maimon, O.: A compact and accurate model for classification. IEEE Trans Knowledge Data Eng 16(2), 203–215, (2004).
Article Google Scholar
Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogni Lett 22(6–7), 799–811, (2001).
Article MATH Google Scholar
Liu H., Motoda H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston (1998)
MATH Google Scholar
Merz, C.J., Murphy, P.M.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http:// www.ics.uci.edu/ mlearn/ MLRepository.html. Accessed 15 Jan 2006.
Opitz, D.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence. AAAI Press, pp. 379–384, (1999).
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learn 1(1), 81–106, (1986).
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J Artif Intell Res 4(1), 77–90, (1996).
MATH Google Scholar
Rakotomamonjy, A.: Variable selection using SVM-based criteria. J. Machine Learn Res 2003(3), 1357–1370, (2003).
Article MathSciNet Google Scholar
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans Evolut Comput 4(2), 164–171, (2000).
Article Google Scholar
Salcedo-Sanz, S., Camps-Valls, G., Perez-Cruz, F., Sepulveda-Sanchis, J., Bousono-Calzon, C.: Enhancing genetic feature selection through restricted search and Walsh analysis. IEEE Trans Syst Man Cybern C 34(4), 398–406, (2004).
Article Google Scholar
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, (1949).
MATH Google Scholar
Sindhwani, V., Rakshit, S., Deodhare, D., Erdogmus, D., Principe, J.C., Niyogi, P.: Feature selection in MLPs and SVMs based on maximum output information. IEEE Trans Neural Netw 15(4), 937–948, (2004).
Article Google Scholar
Somol, P., Pudil, P., Kittler, J.: Fast branch and bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Machine Intell 26(7), 900–912, (2004).
Article Google Scholar
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Inform Fusion 6(1), 83–98, 2005).
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York, (1998).
MATH Google Scholar
Weston, J., Elisseff, A., Schoelkopf, A.B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J Machine Learn Res 2003(3), 1439–1461, (2003).
Article Google Scholar
Yang, J.H., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell Systems 13(2), 44–49, (1998).
Article Google Scholar
Zhu, F., Guan, S.: Feature selection for modular GA-based classification. Appl Soft Comput J 4(4), 381–393, (2004).
Article Google Scholar
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag 17(3), 73–83, (1996).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Harbin University of Science and Technology, Xuefu Road 52, Harbin, 150080, China
Jinjie Huang & Panxiang Rong

Authors

Jinjie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Panxiang Rong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jinjie Huang or Panxiang Rong .

Editor information

Editors and Affiliations

Department of Biostatistics and Department of Genome Sciences, University of Washington, 1705 NE Pacific St., Box 357730, Seattle, WA, 98195, USA
Frank Emmert-Streib
Queen's University Belfast Computational Biology and Machine Learning, Center for Cancer Research and Cell Biology School of Biomedical Sciences, 97 Lisburn Road, Belfast, BT9 7BL, UK
Frank Emmert-Streib
Institute of Discrete Mathematics and Geometry, Vienna University of Technology, Wiedner Hauptstr. 8–10, Vienna, 1040, Austria
Matthias Dehmer
Probability and Statistics, University of Coimbra Center for Mathematics, Apartado 3008, Coimbra, 3001–454, Portugal
Matthias Dehmer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, J., Rong, P. (2009). A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information. In: Emmert-Streib, F., Dehmer, M. (eds) Information Theory and Statistical Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84816-7_6

Download citation

DOI: https://doi.org/10.1007/978-0-387-84816-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84815-0
Online ISBN: 978-0-387-84816-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics