Hybrid Connectionist Models For Continuous Speech Recognition

Bourlard, Hervé; Morgan, Nelson

doi:10.1007/978-1-4613-1367-0_11

Hybrid Connectionist Models For Continuous Speech Recognition

Hervé Bourlard³ &
Nelson Morgan³

Chapter

433 Accesses
3 Citations

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

The dominant technology for the recognition of continuous speech is based on Hidden Markov Models (HMMs). These models provide a fundamental structure that is powerful and flexible, but the probability estimation techniques used with these models typically suffer from a number of significant limitations. Over the last few years, we have demonstrated that fairly simple Multi-Layered Perceptrons (MLPs) can be discriminatively trained to estimate emission probabilities for HMMs. Simple context-independent systems based on this approach have performed very well on large vocabulary continuous speech recognition. This chapter will briefly review the fundamentals of HMMs and MLPs, and will then describe a form of hybrid system that has some discriminant properties.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. R. Bahl, P. F. Brown, P. V. de Souza P.V., & R.L. Mercer, “Maximum mutual information estimation of hidden Markov model parameters,” Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Tokyo), pp. 49–52, 1986.
Google Scholar
L. Baum, “An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes,” Inequalities, no. 3, pp. 1–8, 1972.
Google Scholar
H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, 1990.
Article Google Scholar
H. Bourlard and N. Morgan, “A continuous speech recognition system embedding MLP into HMM,” Advances in Neural Information Processing Systems 2 (D. S. Touretzky, Ed.), pp. 413–416. Morgan Kaufmann, San Mateo CA, 1990.
Google Scholar
H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.
Google Scholar
H. Bourlard and N. Morgan, “GDNN: A context dependent neural network for continuous speech recognition,” IEEE Proc. Intl. Conf. on Acoustics, Speech, & Signal Processing (San Francisco, GA), pp. 11:349–352, 1992.
Google Scholar
J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault (Eds.), NATO ASI Series, pp. 227–236, 1990.
Google Scholar
P. F. Brown, “The acoustic-modelling problem in automatic speech recognition,” PhD Thesis, School of Computer Science, Carnegie Mellon University, 1987.
Google Scholar
M. Cohen, H. Murveit, J. Bernstein, P. Price, & M. Weintraub, “The DECIPHER speech recognition system,” in Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Albuquerque, NM), pp. 77–80, 1990.
Google Scholar
M. Cohen, H. Franco, N. Morgan, D. Rumelhart, & V. Abrash, “Context-Dependent Multiple Distribution Phonetic Modeling,” Advances in Neural Information Processing Systems 5 (S.J. Hanson, J.D. Cowan, & C.L. Giles, Eds.), pp. 649–657, 1993.
Google Scholar
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Wiley Interscience, New York, 1973.
MATH Google Scholar
S. Furui, “On the role of spectral transition for speech perception,” J. Acoust. Soc. Am., 80:4, pp. 1016–1025, 1986
Article Google Scholar
H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (Albuquerque, NM), pp. 1361–1364, 1990.
Google Scholar
H. Hermansky, “Perceptual Linear Prediction (PLP) Analysis of Speech,” Journal of the Acoust. Soc. Am., vol. 87, no. 4, 1990.
Google Scholar
F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of the IEEE, vol. 64, no. 4, pp. 532–555, 1976.
Article Google Scholar
D. Jurafsky, C. Wooters, G. Tajchman, J. Segal, A. Stolcke, & N. Morgan, “The Berkeley restaurant project,” Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), In Press, 1994.
Google Scholar
R. P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, 1989.
Article Google Scholar
N. Morgan and H. Bourlard, “Generalization and parameter estimation in feedforward nets: some experiments,” Advances in Neural Information Processing Systems 2 (D.S. Touretzky, Ed.), San Mateo, CA: Morgan Kaufmann, pp. 630–637, 1990.
Google Scholar
N. Morgan, H. Bourlard, S. Renais, M. Cohen, & H. Franco, “Hybrid neural network/hidden Markov model systems for continuous speech recognition,” Intl. Jour, of Pattern Recognition and Artificial Intelligence, Special Issue on Advances in Pattern Recognition Systems using Neural Networks (L Guyon and P. Wang, Eds.), vol. 7, no. 4, 1993.
Google Scholar
N. Morgan, H. Boudard, S. Greenberg, & H. Hermansky, “Stochastic Perceptual Auditory-Event-Based Models for Speech Recognition”, to be published in Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), Sept. 1994.
Google Scholar
D. B. Paul, J. K. Baker, & J. M. Baker, “On the interaction between true source, training, and testing language models,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, & Signal Processing, (Toronto, Canada), pp. 569–572, 1991.
Google Scholar
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.
Article Google Scholar
S. Renais, M. Morgan, & H. Bourlard, “Probability estimation by feedforward networks in continuous speech recognition,” IEEE Proc. Workshop on Neural Networks for Signal Processing (Princeton, NJ), B.H. Juang, S.Y. Rung and C.A. Kann (Eds.), pp. 309–318, 1991.
Google Scholar
S. Renais, N. Morgan, H. Bourlard, M. Gohen, & H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Trans, on Speech and Audio Processing, vol. 2, no. 1, pp. 161–174, 1994.
Article Google Scholar
M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities”. Neural Computation, no. 3, pp. 461–483, 1991.
Article Google Scholar
D. E. Rumelhart, G. E. Hinton, & R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Procressing (D. E. Rumelhart and J.L. McClelland, Eds.), vol. 1, pp. 318–362. MIT Press, Cambridge MA, 1986.
Google Scholar
S. Viglione, “Applications of pattern recognition technology in adaptive learning and pattern recognition systems,” in Adaptive Learning and Pattern Recognition Systems (J. Mendel and K. Fu, Eds.), New York, Academic Press, pp.115–161, 1970.
Chapter Google Scholar
C. Wooters, “Lexical Modeling in a Speaker-Independent Speech Understanding System,” ICSI Technical Report TR-93–068, also a UC Berkeley PhD Thesis.
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, 94704, USA
Hervé Bourlard & Nelson Morgan

Authors

Hervé Bourlard
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Morgan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
Chin-Hui Lee & Frank K. Soong &
School of Microelectronic Engineering, Griffith University, Australia
Kuldip K. Paliwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bourlard, H., Morgan, N. (1996). Hybrid Connectionist Models For Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_11

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1367-0_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics