Abstract
In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Neemuchwala, H., Hero, A., Carson, P.: Image registration methods in high dimensional space. International Journal on Imaging (2006)
Sima, C., Dougherty, E.R.: What should be expected from feature selection in small-sample settings. Bioinformatics 22(19), 2430–2436 (2006)
Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6, 148 (2005)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence (1997)
Perkins, S., Theiler, J.: Online Feature Selection using Grafting. In: ICML 2003. Proceedings of the Twentieth International Conference on Machine Learning, Washington DC (2003)
Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8) (2005)
Cover, T., Thomas, J.: Elements of Information Theory. J. Wiley and Sons, Chichester (1991)
Beirlant, E., Dudewicz, E., Gyorfi, L., Van der Meulen, E.: Nonparametric Entropy Estimation. International Journal on Mathematical and Statistical Sciences 5(1), 17–39 (1996)
Paninski, I.: Estimation of Entropy and Mutual Information. Neural Computation 15(1) (2003)
Viola, P., Wells-III, W.M.: Alignment by Maximization of Mutual Information. In: 5th Intern. Conf. on Computer Vision, IEEE, Los Alamitos (1995)
Viola, P., Schraudolph, N.N., Sejnowski, T.J.: Empirical Entropy Manipulation for Real-World Problems. Adv. in Neural Infor. Proces. Systems 8(1) (1996)
Hyvarinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks 13(4-5), 411–430 (2000)
Wolpert, D., Wolf, D.: Estimating Function of Probability Distribution from a Finite Set of Samples, Los Alamos National Laboratory Report LA-UR-92-4369, Santa Fe Institute Report TR-93-07-046 (1995)
Hero, A.O., Michel, O.: Applications of spanning entropic graphs. IEEE Signal Processing Magazine 19(5), 85–95 (2002)
Hero, A.O., Michel, O.: Asymptotic theory of greedy aproximations to minnimal k-point random graphs. IEEE Trans. on Infor. Theory 45(6), 1921–1939 (1999)
Bertsimas, D.J., Van Ryzin, G.: An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability. Operations Research Letters 9(1), 223–231 (1990)
Zyczkowski, K.: Renyi Extrapolation of Shannon Entropy. Open Systems and Information Dynamics 10(3), 298–310 (2003)
Mokkadem, A.: Estimation of the entropy and information of absolutely continuous random variables. IEEE Trans. on Inform. Theory 35(1), 193–196 (1989)
Peñalver, A., Escolano, F., Sáez, J.M.: EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models. ICPR , 451–455 (2006)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 601–608 (2001)
Gentile, C.: Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms. In: Proceedings NIPS (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bonev, B., Escolano, F., Cazorla, M.A. (2007). A Novel Information Theory Method for Filter Feature Selection. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-76631-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)