Information Measures

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


Information theoretic learning (ITL) was initiated in the late 1990s at CNEL [126]. It uses descriptors from information theory (entropy and divergences) estimated directly from the data to substitute the conventional statistical descriptors of variance and covariance. It can be used in the adaptation of linear or nonlinear filters and also in unsupervised and supervised machine learning applications. In this chapter, we introduce two commonly used differential entropies for data understanding and information theoretic measures (ITMs) for evaluations in abstaining classifications.


Mutual Information Linear Discriminant Analysis Gaussian Mixture Model Confusion Matrix Shannon Entropy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 2.
    Ahna, J., Oha, J., Choib, S.: Learning principal directions: Integrated-squared-error minimization. Neurocomputing 70, 1372–1381 (2007)CrossRefGoogle Scholar
  2. 7.
    Belhumeur, P.N., Hespanha, J., Kriegman, D.J.: Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997)Google Scholar
  3. 8.
    Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research pp. 2399–2434 (2006)Google Scholar
  4. 13.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)Google Scholar
  5. 15.
    Cai, D., He, X., Han, J.: Spectral regression for efficient regularized subspace learning. In: International Conference on Computer Vision, pp. 1–7 (2007)Google Scholar
  6. 23.
    Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters 14(10), 707–710 (2007)CrossRefGoogle Scholar
  7. 28.
    Combettes, P.L., Pesquet, J.C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM Journal on Optimization 18(4), 1531–1376 (2007)MathSciNetGoogle Scholar
  8. 31.
    Cover, T.M., Thomas, J.A.: Elements of information theory, 2nd edition. NewYork: John Wiley (2005)CrossRefGoogle Scholar
  9. 38.
    Donoho, D.L.: Compressed sensing. IEEE Transactions on Information Theory 52(4), 1289–1306 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 47.
    Feng, Y., Huang, X., Shi, L., Yang, Y., Suykens, J.A.K.: Learning with the maximum correntropy criterion induced losses for regression. Tech. rep., K.U.Leuven (Leuven, Belgium) (2013)Google Scholar
  11. 51.
    Fornasier, M.: Theoretical foundations and numerical methods for sparse recovery. Radon Series on Computational and Applied Mathematics 9, 1–121 (2010)Google Scholar
  12. 56.
    Guyon, I., Elissee, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  13. 57.
    He, R., Ao, M., Xiang, S., Li, S.: nearest feature line: a tangent approximation. In: Chinese Conference on Pattern Recognition (2008)Google Scholar
  14. 59.
    He, R., Hu, B.G., Yuan, X., Zheng, W.S.: Principal component analysis based on nonparametric maximum entropy. Neurocomputing 73, 1840–1952 (2010)CrossRefGoogle Scholar
  15. 60.
    He, R., Hu, B.G., Zheng, W.S., Guo, Y.Q.: Two-stage sparse representation for robust recognition on large-scale database. In: AAAI Conference on Artificial Intelligence, pp. 475–480 (2010)Google Scholar
  16. 70.
    Hellier, P., Barillot, C., Memin, E., Perez, P.: An energy-based framework for dense 3D registration of volumetric brain images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2000)Google Scholar
  17. 73.
    Hou, L., He, R.: Minimum entropy linear embedding based on gaussian mixture model. In: Asian Conference on Pattern Recognition, pp. 362–366 (2011)Google Scholar
  18. 74.
    Hu, B.G., He, R., Yuan, X.: Information-theoretic measures for objective evaluation of classifications. Acta Automatica Sinica 38(7), 1169–1182 (2012)CrossRefMathSciNetGoogle Scholar
  19. 75.
    Hu, B.G., Wang, Y.: Evaluation criteria based on mutual information for classifications including rejected class. Acta Automatica Sinica 34, 1396–1403 (2008)CrossRefMathSciNetGoogle Scholar
  20. 76.
    Huber, P.: Robust statistics. Wiley (1981)Google Scholar
  21. 77.
    Hyvarinen, A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10, 626–634 (1999)CrossRefGoogle Scholar
  22. 78.
    Hyvärinen, A.: Survey on independent component analysis. Neural Computing Surveys 2, 94–128 (1999)Google Scholar
  23. 79.
    Idier, J.: Convex half-quadratic criteria and interacting auxiliary variables for image restoration. IEEE Transactions on Image Processing 10(7), 1001–1009 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 82.
    Jeonga, K.H., Liu, W.F., Han, S., Hasanbelliu, E., Principe, J.C.: The correntropy mace filter. Pattern Recognition 42(5), 871–885 (2009)CrossRefGoogle Scholar
  25. 87.
    Laaksonen, J.: Local subspace classifier. In: International Conference on Artificial Neural Networks (1997)Google Scholar
  26. 92.
    Li, S.Z.: Face recognition based on nearest linear combinations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 839–844 (1998)Google Scholar
  27. 99.
    Luenberger, D.: Optimization by vector space methods. Wiley (1969)Google Scholar
  28. 100.
    Mairal, J., Elad, M., Sapiro, G.: Sparse representation for color image restoration. IEEE Transactions on Image Processing 17(1), 53–69 (2008)CrossRefMathSciNetGoogle Scholar
  29. 104.
    Mazeta, V., Carteretb, C., Briea, D., Idierc, J., Humbert, B.: Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometrics and Intelligent Laboratory Systems 76(2), 121–133 (2005)CrossRefGoogle Scholar
  30. 111.
    Nishii, R., Tanaka, S.: Accuracy and inaccuracy assessments in land-cover classification. IEEE Transactions Geoscience and Remote Sensing 37, 491–498 (1999)CrossRefGoogle Scholar
  31. 113.
    Niu, G., Jitkrittum, W., Dai, B., Hachiya, H., Sugiyama, M.: Squared-loss mutual information regularization: A novel information-theoretic approach to semi-supervised learning. In: International Conference on Machine Learning (2013)Google Scholar
  32. 117.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series 2, 559–572 (1901)CrossRefGoogle Scholar
  33. 119.
    Plumbley, M.: Recovery of sparse representations by polytope faces pursuit. In: Proceedings of International Conference on Independent Component Analysis and Blind Source Separation, pp. 206–213 (2006)Google Scholar
  34. 124.
    Principe, J.C., Xu, D., Fisher, J.W.: Information-theoretic learning. In: S. Haykin, editor, Unsupervised Adaptive Filtering, Volume 1: Blind-Source Separation. Wiley (2000)Google Scholar
  35. 126.
    Rao, S., Liu, W., Principe, J.C., de Medeiros Martins, A.: Information theoretic mean shift algorithm. In: Machine Learning for Signal Processing (2006)Google Scholar
  36. 127.
    Renyi, A.: On measures of entropy and information. Selected Papers of Alfred Renyi 2, 565–580 (1976)Google Scholar
  37. 129.
    Roullot, E., Herment, A., Bloch, I., de Cesare, A., Nikolova, M., Mousseaux, E.: Modeling anisotropic undersampling of magnetic resonance angiographies and reconstruction of a high-resolution isotropic volume using half-quadratic regularization techniques. Signal Processing 84(4), 743–762 (2004)Google Scholar
  38. 130.
    Rousseeuw, P.J.: Least median of squares regression. Journal of the American Statistical Association 79(388), 871–880 (1984)CrossRefzbMATHMathSciNetGoogle Scholar
  39. 134.
    Shannon, C.: A mathematical theory of communication. Bell System Technical Journal 27, 623–653 (1948)CrossRefMathSciNetGoogle Scholar
  40. 137.
    Shi, Y., Sha, F.: Information-theoretical learning of discriminative clusters for unsupervised domain adaptation. In: International Conference on Machine Learning (2012)Google Scholar
  41. 138.
    Siegel, A.F.: Robust regression using repeated medians. Biometrika 69(1), 242–244 (1982)CrossRefzbMATHGoogle Scholar
  42. 144.
    Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, vol. 14, pp. 985–992 (2001)Google Scholar
  43. 145.
    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, 2837–2854 (2010)zbMATHMathSciNetGoogle Scholar
  44. 160.
    Yang, S., Zha, H., Zhou, S., Hu, B.G.: Variational graph embedding for globally and locally consistent feature extraction. In: Europe Conference on Machine Learning (ECML), pp. 538–553 (2009)Google Scholar
  45. 165.
    Zhang, T.: Multi-stage convex relaxation for learning with sparse regularization. In: Proceedings of Neural Information Processing Systems, pp. 16–21 (2008)Google Scholar
  46. 169.
    Zhang, Z.: Parameter estimation techniques: A tutorial with application to conic fitting. Image and Vision Computing 15(1), 59–76 (1997)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation Chinese Academy of SciencesBeijingChina
  2. 2.School of Information and ControlNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations