Feature selection aims to reduce the dimensionality of patterns for clas-sificatory analysis by selecting the most informative rather than irrelevant and/or redundant features. In this study, a hybrid genetic algorithm for feature selection is presented to combine the advantages of both wrappers and filters. Two stages of optimization are involved. The outer optimization stage completes the global search for the best subset of features in a wrapper way, in which the mutual information between the predictive labels of a trained classifier and the true classes serves as the fitness function for the genetic algorithm. The inner optimization performs the local search in a filter manner, in which an improved estimation of the conditional mutual information acts as an independent measure of feature ranking. This measure takes into account not only the relevance of the candidate feature to the output classes but also the redundancy to the features already selected. The inner and outer optimizations cooperate with each other and achieve the high global predictive accuracy as well as the high local search efficiency. Experimental results demonstrate both parsimonious feature selection and excellent classification accuracy of the method on a range of benchmark data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmad, A., Dey, L.: A feature selection technique for classificatory analysis. Pattern Recognit Lett 26(1), 43–56, (2005).
Alon U., Barkai N., et al.: The colon microarray data set. http://microarray.princeton. edu/oncology/affydata/, 1999.
Amaldi, E., Kann, V.: On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1C2), 237–260, (1998).
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550, (1994).
Bhanu, B., Lin, Y.: Genetic algorithm based feature selection for target detection in SAR images. Image Vision Comput 21(7), 591– 608, (2003).
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995).
Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc. of the 11th Internat. Conf. on Machine Learn., New Brunswick, NJ, USA, pp. 28–36, (1994).
Cohen, J.: A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1), 37–46, (1960).
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York, (1991).
Dash, M., Liu, H.: Feature selection for classification. Intell Data Anal 1(3), 131–156, (1997).
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif Intell 151(1–2), 155–176, (2003).
Doak, J.: An evaluation of feature selection methods and their application to computer security. CSE Technical Report 92-18, University of California at Davis, (1992).
Erdogmus, D., Principe, J.: Lower and upper bounds for misclassification probability based on Renyis information. J VLSI Signal Process Systems 37(2–3), 305–317, (2004).
Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. Wiley, New York, (1961).
Feder M and Merhav N.: Relations between entropy and error probability. IEEE Trans Inform Theory, 40(1), 259–266, (1994)
Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2), 1134–1140, (1986).
Grall-Maes, E., Beauseroy, P.: Mutual information-based feature extraction on the time– frequency plane. IEEE Trans Signal Process 50(4), 779–790, (2002).
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J Machine Learn Res 2003(3), 1157–1182, (2003).
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learn 46(1–3), 389–422, (2002).
Han, TS. and Verdu, S.: Generalizing the fano inequality. IEEE Trans Inform Theory 40(4), 1147–1157, (1994)
Hellman, M.E., Raviv, J.: Probability of error, equivocation and the chernoff bound. IEEE Trans Inform Theory 16(4), 368–372, (1970).
Huang, D., Chow, T.W.S.: Effective feature selection scheme using mutual information. Neurocomputing 63(1), 325–343, (2005).
Oh, Il-Seok, Lee, Jin-Seon, Moon, Byung-Ro: Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Machine Intell 26(11), 1424–1437, (2004).
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif Intell 97(1–2), 273–324, (1997).
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. Internat. Conf. Machine Learn., Bari, Italy, pp. 284–292, (1996)
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1), 25–41, (2000).
Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans Neural Netw 13(1), 143–159, (2002).
Last, M., Maimon, O.: A compact and accurate model for classification. IEEE Trans Knowledge Data Eng 16(2), 203–215, (2004).
Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogni Lett 22(6–7), 799–811, (2001).
Liu H., Motoda H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston (1998)
Merz, C.J., Murphy, P.M.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http:// www.ics.uci.edu/ mlearn/ MLRepository.html. Accessed 15 Jan 2006.
Opitz, D.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence. AAAI Press, pp. 379–384, (1999).
Quinlan, J.R.: Induction of decision trees. Machine Learn 1(1), 81–106, (1986).
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J Artif Intell Res 4(1), 77–90, (1996).
Rakotomamonjy, A.: Variable selection using SVM-based criteria. J. Machine Learn Res 2003(3), 1357–1370, (2003).
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans Evolut Comput 4(2), 164–171, (2000).
Salcedo-Sanz, S., Camps-Valls, G., Perez-Cruz, F., Sepulveda-Sanchis, J., Bousono-Calzon, C.: Enhancing genetic feature selection through restricted search and Walsh analysis. IEEE Trans Syst Man Cybern C 34(4), 398–406, (2004).
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, (1949).
Sindhwani, V., Rakshit, S., Deodhare, D., Erdogmus, D., Principe, J.C., Niyogi, P.: Feature selection in MLPs and SVMs based on maximum output information. IEEE Trans Neural Netw 15(4), 937–948, (2004).
Somol, P., Pudil, P., Kittler, J.: Fast branch and bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Machine Intell 26(7), 900–912, (2004).
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Inform Fusion 6(1), 83–98, 2005).
Vapnik, V.: Statistical Learning Theory. Wiley, New York, (1998).
Weston, J., Elisseff, A., Schoelkopf, A.B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J Machine Learn Res 2003(3), 1439–1461, (2003).
Yang, J.H., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell Systems 13(2), 44–49, (1998).
Zhu, F., Guan, S.: Feature selection for modular GA-based classification. Appl Soft Comput J 4(4), 381–393, (2004).
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag 17(3), 73–83, (1996).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Huang, J., Rong, P. (2009). A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information. In: Emmert-Streib, F., Dehmer, M. (eds) Information Theory and Statistical Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84816-7_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-84816-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84815-0
Online ISBN: 978-0-387-84816-7
eBook Packages: Computer ScienceComputer Science (R0)