Classification with EEC, Divergence Measures, and Error Bounds

  • Deniz Erdogmus
  • Dongxin Xu
  • Kenneth HildII
Part of the Information Science and Statistics book series (ISS)


The previous chapters provided extensive coverage of the error entropy criterion (EEC) especially in regard to minimization of the error entropy (MEE) for linear and nonlinear filtering (or regression) applications. However, the spectrum of engineering applications of adaptive systems is much broader than filtering or regression. Even looking at the subclass of supervised applications we have yet to deal with classification, which is an important application area for learning technologies. All of the practical ingredients are here to extend EEC to classification inasmuch as Chapter 5 covered the integration of EEC with the backpropagation algorithm (MEE-BP). Hence we have all the tools needed to train classifiers with MEE. We show that indeed this is the case and that the classifiers trained with MEE have performances normally better than MSE-trained classifiers. However, there are still no mathematical foundations to ascertain under what conditions EEC is optimal for classification, and further work is necessary.


Mean Square Error Mutual Information Class Label Synthetic Aperture Radar Image Kernel Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.Google Scholar
  2. 2.
    Ahmad I., Lin P., A nonparametric estimation of the entropy for absolutely continuous distributions, IEEE Trans. on Inf. Theor., 22:372–375, 1976.CrossRefMATHMathSciNetGoogle Scholar
  3. 21.
    Battiti R., Using mutual information for selecting features in supervised neural net learning, IEEE Trans. Neural Netw., 5(4):537–550, July 1994.CrossRefGoogle Scholar
  4. 37.
    Biem A., Katagiri S., Juang B., Pattern recognition using discriminative feature extraction, IEEE Trans. Signal Process., 45(2):500–504, Feb. 1997.CrossRefGoogle Scholar
  5. 38.
    Bishop C., Neural Networks for Pattern Recognition, Clarendon Press, Oxford, 1995.Google Scholar
  6. 65.
    Cover T., Thomas J., Elements of Information Theory, Wiley, New York, 1991CrossRefMATHGoogle Scholar
  7. 68.
    Deco G., Obradovic D., An Information-Theoretic Approach to Neural Computing, Springer, New York, 1996.CrossRefMATHGoogle Scholar
  8. 80.
    Duda R., Hart P., Stork D., Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 2nd edition, 2001.Google Scholar
  9. 86.
    Erdogmus D., Information theoretic learning: Renyi’s entropy and its applications to adaptive systems training, Ph.D. Dissertation, University of Florida, Gainesville, 2002.Google Scholar
  10. 92.
    Erdogmus D., Principe J., Lower and upper bounds for misclassification probability based on Renyi’s information, J. VLSI Signal Process. Syst., 37(2/3):305–317, 2004.CrossRefMATHGoogle Scholar
  11. 97.
    Fano R., Transmission of Information: A Statistical Theory of Communications, MIT Press, New York, 1961.Google Scholar
  12. 98.
    Feder M., Merhav N., Relations between entropy and error probability, IEEE Trans. Inf. Theor., 40(1); 259–266, 1994.CrossRefMATHMathSciNetGoogle Scholar
  13. 101.
    Fisher R., The use of multiple measurements in taxonomic problems, Ann. Eugenics 7; 170–188, Wiley, New York, 1950.Google Scholar
  14. 108.
    Fukunaga K., An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972Google Scholar
  15. 132.
    Han T., Verdu S., Generalizing the Fano inequality, IEEE Trans. Inf. Theor., 40(4):1247–1251, 1994.CrossRefMATHMathSciNetGoogle Scholar
  16. 145.
    Hertz J., Krogh A., and Palmer R., Introduction to the Theory of Neural Computation, Addison Wesley, Readings, MA, 1991.Google Scholar
  17. 149.
    Hild II K., Blind source separation of convolutive mixtures using Renyi’s divergence, University of Florida, Gainesville, Fall 2003.Google Scholar
  18. 151.
    Hild II K., Erdogmus D., Torkkola K., and Principe J., Feature extraction using information-theoretic learning, IEEE Trans. Pat. Anal. Mach. Intell. 28(9):1385–1392, 2006CrossRefGoogle Scholar
  19. 195.
    LeCun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE, 86(11):2278–2324, Nov. 1998.CrossRefGoogle Scholar
  20. 223.
    Morejon R., An information theoretic approach to sonar automatic target recognition, Ph.D. dissertation, University of Florida, Spring 2003Google Scholar
  21. 253.
    Principe J., Euliano N., Lefebvre C., Neural Systems: Fundamentals through Simulations, CD-ROM textbook, John Wiley, New York, 2000.Google Scholar
  22. 254.
    Principe J., Xu D., Zhao Q., Fisher J. Learning from examples with information theoretic criteria, VLSI Signal Process. Syst., 26:61–77, 2001.CrossRefGoogle Scholar
  23. 268.
    Ripley B., Pattern Recognition and Neural Networks, Cambridge University Press, New York, 1996CrossRefMATHGoogle Scholar
  24. 283.
    Santos J., Alexandre L., Sa J., The error entropy minimization algorithm for neural network classification, in A. Lofti (Ed.), Int Conf. Recent Advances in Soft Computing, pp. 92–97, 2004.Google Scholar
  25. 298.
    Silva L., Felgueiras C., Alexandre L., Sa J., Error entropy in classification problems: a univariate data analysis, Neural comput., 18(9):2036–2061, 2006.CrossRefMATHMathSciNetGoogle Scholar
  26. 299.
    Silva L., Neural networks with error density risk functionals for data classification, Ph.D. Thesis, Faculdade de Engenharia, University of Porto, 2008.Google Scholar
  27. 309.
    Stoller D., Univariate two population distribution free discrimination, J. Amer. Stat. Assoc. 49: 770–777, 1954.CrossRefMATHMathSciNetGoogle Scholar
  28. 318.
    Torkkola K., Visualizing class structure in data using mutual information, Proceedings of NNSP X, pp. 376–385, Sydney, Australia, 2000Google Scholar
  29. 323.
    Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995CrossRefMATHGoogle Scholar
  30. 324.
    Velten V., Ross T., Mossing J., Worrell S., Bryant M., Standard SAR ATR evaluation experiments using the MSTAR public release data set, research report, Wright State University, 1998.Google Scholar
  31. 334.
    Wittner B. Denker J., Strategies for teaching layered networks classification tasks, in Neural Inf. Proc. Syst. (Ed Anderson), 850–859, Ame. Inst. Phys. 1987.Google Scholar
  32. 340.
    Xu D., Energy, Entropy and Information Potential for Neural Computation, PhD Dissertation, University of Florida, Gainesville, 1999Google Scholar
  33. 349.
    Zhao Q., Principe J., Brennan V., Xu D., Wang Z., Synthetic aperture radar automatic target recognition with three strategies of learning and representation, Opt. Eng., 39(5):1230–1244, 2000.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Deniz Erdogmus
    • 1
  • Dongxin Xu
    • 1
  • Kenneth HildII
    • 1
  1. 1.Dept. Electrical Engineering & Biomedical EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations