Abstract
Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the mutual information between a subset of features and an output vector can be of great interest in feature selection. This is particularly the case when some features are only jointly redundant or relevant. In this paper, different mutual information estimators are compared according to important criteria for feature selection; the interest of a nearest neighbors-based estimator is shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellman, R.E.: Adaptive control processes - A guided tour. Princeton University Press (1961)
Shannon, C.E.: A mathematical Theory of Communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)
Rossi, F., Lendasse, A., François, D., Wertz, V., Verleysen, M.: Mutual Information for the Selection of Relevant Variables in Spectrometric Nonlinear Modelling. Chemometr. Intell. Lab. 80, 215–226 (2006)
Peng, H., Fuhui, L., Chris, D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE T. Pattern Anal. 27, 1226–1238 (2005)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Mach. Lear. Res. 3, 1157–1182 (2003)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating Mutual Information. Phys. Rev. E 69, 066138 (2004)
François, D., Rossi, F., Wertz, V., Verleysen, M.: Resampling Methods for Parameter-free and Robust Feature Selection with Mutual Information. Neurocomputing 70, 1276–1288 (2007)
Sturges, H.A.: The Choice of a Class Interval. J. Am. Stat. Assoc. 21, 65–66 (1926)
Scott, D.W.: On optimal and data-based histograms. Biometrika 66, 605–610 (1979)
Parzen, E.: On Estimation of a Probability Density Function and Mode. Ann. Math. Statist. 33, 1065–1076 (1962)
Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall, London (1986)
Turlach, B.A.: Bandwidth Selection in Kernel Density Estimation: A Review. CORE and Institut de Statistique, 23–493 (1993)
Daub, C., Steuer, R., Selbig, J., Kloska, S.: Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5 (2004)
Darbellay, G.A., Vajda, I.: Estimation of the information by an adaptive partitioning of the observation space. IEEE T. Inform. Theory 45(4), 1315–1321 (1999)
Li, S., Mnatsakanov, R.M., Andrew, M.E.: k-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions. Entropy 13, 650–667 (2011)
Walters-Williams, J., Li, Y.: Estimation of Mutual Information: A Survey. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS, vol. 5589, pp. 389–396. Springer, Heidelberg (2009)
Friedman, J.H.: Multivariate Adaptive Regression Splines. Ann. Stat. 19, 1–67 (1991)
Rossi, F., Delannay, N., Conan-Guez, B., Verleysen, M.: Representation of functional data in neural networks. Neurocomputing 64, 183–210 (2005)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1981)
Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71, 353–360 (1984)
Rudemo, M.: Empirical Choice of Histograms and Kernel Density Estimators. Scand. J. Stat. 9 (1982)
Hall, P., Sheater, S.J., Jones, M.C., Marron, J.S.: On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78, 263–269 (1991)
Gomez-Verdejo, V., Verleysen, M., Fleury, J.: Information-Theoretic Feature Selection for Functional Data Classification. Neurocomputing 72, 3580–3589 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Doquire, G., Verleysen, M. (2013). A Performance Evaluation of Mutual Information Estimators for Multivariate Feature Selection. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Pattern Recognition - Applications and Methods. Advances in Intelligent Systems and Computing, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36530-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-36530-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36529-4
Online ISBN: 978-3-642-36530-0
eBook Packages: EngineeringEngineering (R0)