Skip to main content

A Performance Evaluation of Mutual Information Estimators for Multivariate Feature Selection

  • Conference paper
Book cover Pattern Recognition - Applications and Methods

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 204))

Abstract

Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the mutual information between a subset of features and an output vector can be of great interest in feature selection. This is particularly the case when some features are only jointly redundant or relevant. In this paper, different mutual information estimators are compared according to important criteria for feature selection; the interest of a nearest neighbors-based estimator is shown.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellman, R.E.: Adaptive control processes - A guided tour. Princeton University Press (1961)

    Google Scholar 

  2. Shannon, C.E.: A mathematical Theory of Communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)

    Google Scholar 

  3. Rossi, F., Lendasse, A., François, D., Wertz, V., Verleysen, M.: Mutual Information for the Selection of Relevant Variables in Spectrometric Nonlinear Modelling. Chemometr. Intell. Lab. 80, 215–226 (2006)

    Article  Google Scholar 

  4. Peng, H., Fuhui, L., Chris, D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE T. Pattern Anal. 27, 1226–1238 (2005)

    Article  Google Scholar 

  5. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Mach. Lear. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  6. Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating Mutual Information. Phys. Rev. E 69, 066138 (2004)

    Google Scholar 

  7. François, D., Rossi, F., Wertz, V., Verleysen, M.: Resampling Methods for Parameter-free and Robust Feature Selection with Mutual Information. Neurocomputing 70, 1276–1288 (2007)

    Article  Google Scholar 

  8. Sturges, H.A.: The Choice of a Class Interval. J. Am. Stat. Assoc. 21, 65–66 (1926)

    Article  Google Scholar 

  9. Scott, D.W.: On optimal and data-based histograms. Biometrika 66, 605–610 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  10. Parzen, E.: On Estimation of a Probability Density Function and Mode. Ann. Math. Statist. 33, 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  11. Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  12. Turlach, B.A.: Bandwidth Selection in Kernel Density Estimation: A Review. CORE and Institut de Statistique, 23–493 (1993)

    Google Scholar 

  13. Daub, C., Steuer, R., Selbig, J., Kloska, S.: Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5 (2004)

    Google Scholar 

  14. Darbellay, G.A., Vajda, I.: Estimation of the information by an adaptive partitioning of the observation space. IEEE T. Inform. Theory 45(4), 1315–1321 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, S., Mnatsakanov, R.M., Andrew, M.E.: k-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions. Entropy 13, 650–667 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Walters-Williams, J., Li, Y.: Estimation of Mutual Information: A Survey. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS, vol. 5589, pp. 389–396. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Friedman, J.H.: Multivariate Adaptive Regression Splines. Ann. Stat. 19, 1–67 (1991)

    Article  MATH  Google Scholar 

  18. Rossi, F., Delannay, N., Conan-Guez, B., Verleysen, M.: Representation of functional data in neural networks. Neurocomputing 64, 183–210 (2005)

    Article  Google Scholar 

  19. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1981)

    Google Scholar 

  20. Bowman, A.W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71, 353–360 (1984)

    Article  MathSciNet  Google Scholar 

  21. Rudemo, M.: Empirical Choice of Histograms and Kernel Density Estimators. Scand. J. Stat. 9 (1982)

    Google Scholar 

  22. Hall, P., Sheater, S.J., Jones, M.C., Marron, J.S.: On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78, 263–269 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  23. Gomez-Verdejo, V., Verleysen, M., Fleury, J.: Information-Theoretic Feature Selection for Functional Data Classification. Neurocomputing 72, 3580–3589 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gauthier Doquire .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Doquire, G., Verleysen, M. (2013). A Performance Evaluation of Mutual Information Estimators for Multivariate Feature Selection. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Pattern Recognition - Applications and Methods. Advances in Intelligent Systems and Computing, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36530-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36530-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36529-4

  • Online ISBN: 978-3-642-36530-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics