Abstract
The dominant technology for the recognition of continuous speech is based on Hidden Markov Models (HMMs). These models provide a fundamental structure that is powerful and flexible, but the probability estimation techniques used with these models typically suffer from a number of significant limitations. Over the last few years, we have demonstrated that fairly simple Multi-Layered Perceptrons (MLPs) can be discriminatively trained to estimate emission probabilities for HMMs. Simple context-independent systems based on this approach have performed very well on large vocabulary continuous speech recognition. This chapter will briefly review the fundamentals of HMMs and MLPs, and will then describe a form of hybrid system that has some discriminant properties.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
L. R. Bahl, P. F. Brown, P. V. de Souza P.V., & R.L. Mercer, “Maximum mutual information estimation of hidden Markov model parameters,” Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Tokyo), pp. 49–52, 1986.
L. Baum, “An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes,” Inequalities, no. 3, pp. 1–8, 1972.
H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, 1990.
H. Bourlard and N. Morgan, “A continuous speech recognition system embedding MLP into HMM,” Advances in Neural Information Processing Systems 2 (D. S. Touretzky, Ed.), pp. 413–416. Morgan Kaufmann, San Mateo CA, 1990.
H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.
H. Bourlard and N. Morgan, “GDNN: A context dependent neural network for continuous speech recognition,” IEEE Proc. Intl. Conf. on Acoustics, Speech, & Signal Processing (San Francisco, GA), pp. 11:349–352, 1992.
J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault (Eds.), NATO ASI Series, pp. 227–236, 1990.
P. F. Brown, “The acoustic-modelling problem in automatic speech recognition,” PhD Thesis, School of Computer Science, Carnegie Mellon University, 1987.
M. Cohen, H. Murveit, J. Bernstein, P. Price, & M. Weintraub, “The DECIPHER speech recognition system,” in Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Albuquerque, NM), pp. 77–80, 1990.
M. Cohen, H. Franco, N. Morgan, D. Rumelhart, & V. Abrash, “Context-Dependent Multiple Distribution Phonetic Modeling,” Advances in Neural Information Processing Systems 5 (S.J. Hanson, J.D. Cowan, & C.L. Giles, Eds.), pp. 649–657, 1993.
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Wiley Interscience, New York, 1973.
S. Furui, “On the role of spectral transition for speech perception,” J. Acoust. Soc. Am., 80:4, pp. 1016–1025, 1986
H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (Albuquerque, NM), pp. 1361–1364, 1990.
H. Hermansky, “Perceptual Linear Prediction (PLP) Analysis of Speech,” Journal of the Acoust. Soc. Am., vol. 87, no. 4, 1990.
F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of the IEEE, vol. 64, no. 4, pp. 532–555, 1976.
D. Jurafsky, C. Wooters, G. Tajchman, J. Segal, A. Stolcke, & N. Morgan, “The Berkeley restaurant project,” Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), In Press, 1994.
R. P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, 1989.
N. Morgan and H. Bourlard, “Generalization and parameter estimation in feedforward nets: some experiments,” Advances in Neural Information Processing Systems 2 (D.S. Touretzky, Ed.), San Mateo, CA: Morgan Kaufmann, pp. 630–637, 1990.
N. Morgan, H. Bourlard, S. Renais, M. Cohen, & H. Franco, “Hybrid neural network/hidden Markov model systems for continuous speech recognition,” Intl. Jour, of Pattern Recognition and Artificial Intelligence, Special Issue on Advances in Pattern Recognition Systems using Neural Networks (L Guyon and P. Wang, Eds.), vol. 7, no. 4, 1993.
N. Morgan, H. Boudard, S. Greenberg, & H. Hermansky, “Stochastic Perceptual Auditory-Event-Based Models for Speech Recognition”, to be published in Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), Sept. 1994.
D. B. Paul, J. K. Baker, & J. M. Baker, “On the interaction between true source, training, and testing language models,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, & Signal Processing, (Toronto, Canada), pp. 569–572, 1991.
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.
S. Renais, M. Morgan, & H. Bourlard, “Probability estimation by feedforward networks in continuous speech recognition,” IEEE Proc. Workshop on Neural Networks for Signal Processing (Princeton, NJ), B.H. Juang, S.Y. Rung and C.A. Kann (Eds.), pp. 309–318, 1991.
S. Renais, N. Morgan, H. Bourlard, M. Gohen, & H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Trans, on Speech and Audio Processing, vol. 2, no. 1, pp. 161–174, 1994.
M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities”. Neural Computation, no. 3, pp. 461–483, 1991.
D. E. Rumelhart, G. E. Hinton, & R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Procressing (D. E. Rumelhart and J.L. McClelland, Eds.), vol. 1, pp. 318–362. MIT Press, Cambridge MA, 1986.
S. Viglione, “Applications of pattern recognition technology in adaptive learning and pattern recognition systems,” in Adaptive Learning and Pattern Recognition Systems (J. Mendel and K. Fu, Eds.), New York, Academic Press, pp.115–161, 1970.
C. Wooters, “Lexical Modeling in a Speaker-Independent Speech Understanding System,” ICSI Technical Report TR-93–068, also a UC Berkeley PhD Thesis.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Bourlard, H., Morgan, N. (1996). Hybrid Connectionist Models For Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1367-0_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive