Skip to main content

Hybrid Connectionist Models For Continuous Speech Recognition

  • Chapter

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

The dominant technology for the recognition of continuous speech is based on Hidden Markov Models (HMMs). These models provide a fundamental structure that is powerful and flexible, but the probability estimation techniques used with these models typically suffer from a number of significant limitations. Over the last few years, we have demonstrated that fairly simple Multi-Layered Perceptrons (MLPs) can be discriminatively trained to estimate emission probabilities for HMMs. Simple context-independent systems based on this approach have performed very well on large vocabulary continuous speech recognition. This chapter will briefly review the fundamentals of HMMs and MLPs, and will then describe a form of hybrid system that has some discriminant properties.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. R. Bahl, P. F. Brown, P. V. de Souza P.V., & R.L. Mercer, “Maximum mutual information estimation of hidden Markov model parameters,” Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Tokyo), pp. 49–52, 1986.

    Google Scholar 

  2. L. Baum, “An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes,” Inequalities, no. 3, pp. 1–8, 1972.

    Google Scholar 

  3. H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, 1990.

    Article  Google Scholar 

  4. H. Bourlard and N. Morgan, “A continuous speech recognition system embedding MLP into HMM,” Advances in Neural Information Processing Systems 2 (D. S. Touretzky, Ed.), pp. 413–416. Morgan Kaufmann, San Mateo CA, 1990.

    Google Scholar 

  5. H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.

    Google Scholar 

  6. H. Bourlard and N. Morgan, “GDNN: A context dependent neural network for continuous speech recognition,” IEEE Proc. Intl. Conf. on Acoustics, Speech, & Signal Processing (San Francisco, GA), pp. 11:349–352, 1992.

    Google Scholar 

  7. J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault (Eds.), NATO ASI Series, pp. 227–236, 1990.

    Google Scholar 

  8. P. F. Brown, “The acoustic-modelling problem in automatic speech recognition,” PhD Thesis, School of Computer Science, Carnegie Mellon University, 1987.

    Google Scholar 

  9. M. Cohen, H. Murveit, J. Bernstein, P. Price, & M. Weintraub, “The DECIPHER speech recognition system,” in Proc. IEEE Intl. Conf. on Acoustic, Speech, & Signal Processing (Albuquerque, NM), pp. 77–80, 1990.

    Google Scholar 

  10. M. Cohen, H. Franco, N. Morgan, D. Rumelhart, & V. Abrash, “Context-Dependent Multiple Distribution Phonetic Modeling,” Advances in Neural Information Processing Systems 5 (S.J. Hanson, J.D. Cowan, & C.L. Giles, Eds.), pp. 649–657, 1993.

    Google Scholar 

  11. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Wiley Interscience, New York, 1973.

    MATH  Google Scholar 

  12. S. Furui, “On the role of spectral transition for speech perception,” J. Acoust. Soc. Am., 80:4, pp. 1016–1025, 1986

    Article  Google Scholar 

  13. H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (Albuquerque, NM), pp. 1361–1364, 1990.

    Google Scholar 

  14. H. Hermansky, “Perceptual Linear Prediction (PLP) Analysis of Speech,” Journal of the Acoust. Soc. Am., vol. 87, no. 4, 1990.

    Google Scholar 

  15. F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of the IEEE, vol. 64, no. 4, pp. 532–555, 1976.

    Article  Google Scholar 

  16. D. Jurafsky, C. Wooters, G. Tajchman, J. Segal, A. Stolcke, & N. Morgan, “The Berkeley restaurant project,” Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), In Press, 1994.

    Google Scholar 

  17. R. P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, 1989.

    Article  Google Scholar 

  18. N. Morgan and H. Bourlard, “Generalization and parameter estimation in feedforward nets: some experiments,” Advances in Neural Information Processing Systems 2 (D.S. Touretzky, Ed.), San Mateo, CA: Morgan Kaufmann, pp. 630–637, 1990.

    Google Scholar 

  19. N. Morgan, H. Bourlard, S. Renais, M. Cohen, & H. Franco, “Hybrid neural network/hidden Markov model systems for continuous speech recognition,” Intl. Jour, of Pattern Recognition and Artificial Intelligence, Special Issue on Advances in Pattern Recognition Systems using Neural Networks (L Guyon and P. Wang, Eds.), vol. 7, no. 4, 1993.

    Google Scholar 

  20. N. Morgan, H. Boudard, S. Greenberg, & H. Hermansky, “Stochastic Perceptual Auditory-Event-Based Models for Speech Recognition”, to be published in Proc. Intl. Conf. on Spoken Language Processing (Yokohama, Japan), Sept. 1994.

    Google Scholar 

  21. D. B. Paul, J. K. Baker, & J. M. Baker, “On the interaction between true source, training, and testing language models,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, & Signal Processing, (Toronto, Canada), pp. 569–572, 1991.

    Google Scholar 

  22. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.

    Article  Google Scholar 

  23. S. Renais, M. Morgan, & H. Bourlard, “Probability estimation by feedforward networks in continuous speech recognition,” IEEE Proc. Workshop on Neural Networks for Signal Processing (Princeton, NJ), B.H. Juang, S.Y. Rung and C.A. Kann (Eds.), pp. 309–318, 1991.

    Google Scholar 

  24. S. Renais, N. Morgan, H. Bourlard, M. Gohen, & H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Trans, on Speech and Audio Processing, vol. 2, no. 1, pp. 161–174, 1994.

    Article  Google Scholar 

  25. M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities”. Neural Computation, no. 3, pp. 461–483, 1991.

    Article  Google Scholar 

  26. D. E. Rumelhart, G. E. Hinton, & R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Procressing (D. E. Rumelhart and J.L. McClelland, Eds.), vol. 1, pp. 318–362. MIT Press, Cambridge MA, 1986.

    Google Scholar 

  27. S. Viglione, “Applications of pattern recognition technology in adaptive learning and pattern recognition systems,” in Adaptive Learning and Pattern Recognition Systems (J. Mendel and K. Fu, Eds.), New York, Academic Press, pp.115–161, 1970.

    Chapter  Google Scholar 

  28. C. Wooters, “Lexical Modeling in a Speaker-Independent Speech Understanding System,” ICSI Technical Report TR-93–068, also a UC Berkeley PhD Thesis.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Bourlard, H., Morgan, N. (1996). Hybrid Connectionist Models For Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_11

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics