Abstract
We present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm [1]. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application. The experiments show that CL-optimized GMMs (CL-GMMs) achieve a lower performance compared to MM-optimized GMMs (MM-GMMs), whereas both discriminative GMMs (DGMMs) perform significantly better than generatively learned GMMs. We also show that the generative discriminatively parameterized GMM classifiers still allow to marginalize over missing features, a case where generative classifiers have an advantage over purely discriminative classifiers such as support vector machines or neural networks.
This work was supported by the Austrian Science Fund (Project number P22488-N23) and (Project number S10604-N13).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Gopalakrishnan, O., Kanevsky, D., Nà das, A., Nahamoo, D.: An inequality for rational functions with applications to some statistical estimation problems. IEEE Transactions on Information Theory 37(1), 107–113 (1991)
Vapnik, V.: Statistical learning theory. Wiley & Sons, Chichester (1998)
Schölkopf, B., Smola, A.: Learning with kernels: Support Vector Machines, regularization, optimization, and beyond. MIT Press, Cambridge (2001)
Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Advances in Neural Information Processing Systems, NIPS (2003)
Guo, Y., Wilkinson, D., Schuurmans, D.: Maximum margin Bayesian networks. In: International Conference on Uncertainty in Artificial Intelligence, UAI (2005)
Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., Tirri, H.: On discriminative Bayesian network classifiers and logistic regression. Machine Learning 59, 267–296 (2005)
Sha, F., Saul, L.: Large margin Gaussian mixture modeling for phonetic classification and recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2006)
Sha, F., Saul, L.: Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 313–316 (2007)
Heigold, G., Deselaers, T., Schlüter, R., Ney, H.: Modified MMI/MPE: A direct evaluation of the margin in speech recognition. In: International Conference on Machine Learning (ICML), pp. 384–391 (2008)
Collobert, R., Siz, F., Weston, J., Bottou, L.: Trading convexity for scalability. In: International Conference on Machine Learning (ICML), pp. 201–208 (2006)
Schlüter, R., Macherey, W., Müller, B., Ney, H.: Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication 34, 287–310 (2001)
Bahl, L., Brown, P., de Souza, P., Mercer, R.: Maximum Mutual Information estimation of HMM parameters for speech recognition. In: IEEE Conf. on Acoustics, Speech, and Signal Proc., pp. 49–52 (1986)
Woodland, P., Povey, D.: Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech and Language 16, 25–47 (2002)
Klautau, A., Jevtić, N., Orlitsky, A.: Discriminative Gaussian mixture models: A comparison with kernel classifiers. In: Inter. Conf. on Machine Learning (ICML), pp. 353–360 (2003)
Pernkopf, F., Van Pham, T., Bilmes, J.: Broad phonetic classification using discriminative Bayesian networks. Speech Communication 143(1), 123–138 (2008)
Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)
Pernkopf, F., Bouchaffra, D.: Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1344–1348 (2005)
Merialdo, B.: Phonetic recognition using hidden Markov models and maximum mutual information training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 111–114 (1988)
Normandin, Y., Morgera, S.: An improved MMIE training algorithm for speaker-independent small vocabulary, continuous speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 537–540 (1991)
Normandin, Y., Cardin, R., De Mori, R.: High-performance connected digit recognition using maximum mutual information estimation. IEEE Trans. on Speech and Audio Proc. 2(2), 299–311 (1994)
Lamel, L., Kassel, R., Seneff, S.: Speech database development: Design and analysis of the acoustic-phonetic corpus. In: DARPA Speech Recognition Workshop, Report No. SAIC-86/1546 (1986)
Crammer, K., Singer, Y.: On the algorithmic interpretation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)
Jain, A., Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition in practice. Handbook of Statistics, vol. 2. North-Holland, Amsterdam (1982)
Baum, L., Eagon, J.: An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology. Bull. Amer. Math. Soc. 73, 360–363 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pernkopf, F., Wohlmayr, M. (2010). Large Margin Learning of Bayesian Classifiers Based on Gaussian Mixture Models. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-15939-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)