Modified minimum classification error learning and its application to neural networks

  • Hiroshi Shimodaira
  • Jun Rokui
  • Mitsuru Nakai
Learning Methodologies
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1451)


A novel method to improve the generalization performance of the Minimum Classification Error (MCE) / Generalized Probabilistic Descent (GPD) learning is proposed. The MCE/GPD learning proposed by Juang and Katagiri in 1992 results in better recognition performance than the maximum-likelihood (ML) based learning in various areas of pattern recognition. Despite its superiority in recognition performance, it still suffers from the problem of "over-fitting" to the training samples as it is with other learning algorithms. In the present study, a regularization technique is employed to the MCE learning to overcome this problem. Feed-forward neural networks are employed as a recognition platform to evaluate the recognition performance of the proposed method. Recognition experiments are conducted on several sorts of datasets.


Discriminant Function Recognition Performance Dynamic Time Warping Generalization Performance Tikhonov Regularizer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Keinosuke Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1972.Google Scholar
  2. 2.
    B-H. Juang and S. Katagiri. Discriminative learning for minimum error classification. IEEE Trans. Signal Processing, 40(12):3043–3054, 1992.CrossRefGoogle Scholar
  3. 3.
    S. Amari. A theory of adaptive pattern classifiers. IEEE Trans. Elec. Comput., EC-16(3):299–307, 1967.Google Scholar
  4. 4.
    T. Kohonen. Learning Vector Quantization. Technical Report TKK-F-A601, Helsinki University of Technology, Laboratory of Computer and Information Science, 1978.Google Scholar
  5. 5.
    E. Oja et al. The ALSM Algorithm-an Improved Subspace Medhotd of Classification. Pattern Recognition, 16(4):421–427, 1983.CrossRefGoogle Scholar
  6. 6.
    Eric McDermott and Shigeru Katagiri. Prototype-based minimum classificatikn error / generalized probabilistic descent training for various speech units. Computer Speech and Language, pages 351–368, August 1994.Google Scholar
  7. 7.
    Biing-Hwang Juang, Wu Chooud, and Chin-Hui Lee. Minimum classification error rate methods for speech recognition. IEEE Trans. Speech and Audio Processing, 5(3):257–265, 1997.CrossRefGoogle Scholar
  8. 8.
    A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-Posed Problems. V. H. Winston, 1977.Google Scholar
  9. 9.
    Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.Google Scholar
  10. 10.
    Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
  11. 11.
    Christopher M. Bishop. Curvature-Driven Smoothing: A Learning Algorithm for Feed-forward Networks. IEEE Trans. Neural Networks, 4(5):882–884, 1993.CrossRefGoogle Scholar
  12. 12.
    D. E. Rumelhart, G. E. Hinton, and R. J. Willams. Learning representations by back-propagation errors. Nature 323 9, 323(9):533–536, October 1986.CrossRefGoogle Scholar
  13. 13.
    C.J. Merz and P.M. Murphy. UCI repository of machine learning databases, 1996. Scholar
  14. 14.
    H. Kuwabara, K. Takeda, Y. Sagisaka, S. Katagiri, S. Morikawa, and T. Watanabe. Construction of a large-scale Japanese speech database and its management system. Proc. of Intl. Conferece on Acoustics, Speech, and Signal Processing (ICASSP89), pages 560–563, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Hiroshi Shimodaira
    • 1
  • Jun Rokui
    • 1
  • Mitsuru Nakai
    • 1
  1. 1.Japan Advanced Institute of Science and TechnologySchool of Information ScienceTatsunokuchi, IshikawaJapan

Personalised recommendations