Skip to main content

The Use of Recurrent Neural Networks in Continuous Speech Recognition

  • Chapter
Automatic Speech and Speaker Recognition

Abstract

This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. F. Silverman and D. P. Morgan, “The application of dynamic programming to connected speech recognition,” IEEE ASSP Magazine, vol. 7, pp. 6–25, July 1990.

    Article  Google Scholar 

  2. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, pp. 257–286, February 1989.

    Article  Google Scholar 

  3. N. Morgan and H. Bourlard, “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proc. ICASSP, pp. 413–416, 1990.

    Google Scholar 

  4. S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, Jan. 1994.

    Google Scholar 

  5. F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381–397, 1980.

    Google Scholar 

  6. K.-F. Lee, Automatic Speech Recognition: The Development of the SPHINX System. Boston: Kluwer Academic Publishers, 1989.

    Google Scholar 

  7. S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 52–59, Feb. 1986.

    Article  Google Scholar 

  8. E. B. Baum and F. Wilczek, “Supervised learning of probability distributions by neural networks,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.

    Google Scholar 

  9. J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neuro-computing: Algorithms, Architectures and Applications (F. Fougelman-Soulie and J. Heŕault, eds.), pp. 227–236, Springer-Verlag, 1989.

    Google Scholar 

  10. H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, Dec. 1990.

    Article  Google Scholar 

  11. H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” in Proc. ICASSP, pp. 1361–1364, 1990.

    Google Scholar 

  12. M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, vol. 3, pp. 461–483, 1991.

    Article  Google Scholar 

  13. H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach. Kluwer Academic Publishers, 1994.

    Google Scholar 

  14. J. S. Bridle, “Alpha-Nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation,” Speech Communication, vol. 9, pp. 83–92, Feb. 1990.

    Article  Google Scholar 

  15. J. S. Bridle and L. Dodd, “An Alphanet approach to optimising input transformations for continuous speech recognition,” in Proc. ICASSP, pp. 277–280, 1991.

    Google Scholar 

  16. L. T. Niles and H. F. Silverman, “Combining hidden Markov models and neural network classifiers,” in Proc. ICASSP, pp. 417–420, 1990.

    Google Scholar 

  17. S. J. Young, “Competitive training in hidden Markov models,” in Proc. ICASSP, pp. 681–684, 1990. Expanded in the technical report Cued/Finfeng/TR.41, Cambridge University Engineering Department.

    Google Scholar 

  18. A. J. Robinson and F. Fallside, “Static and dynamic error propagation networks with application to speech coding,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.

    Google Scholar 

  19. P. McCullagh and J. A. Neider, Generalised Linear Models. London: Chapman and Hall, 1983.

    Google Scholar 

  20. T. Robinson, “The state space and “ideal input” representations of recurrent networks,” in Visual Representations of Speech Signals, pp. 327–334, John Wiley and Sons, 1993.

    Google Scholar 

  21. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” J. Roy. Statist. Soc., vol. B39, pp. 1–38, 1977.

    MathSciNet  Google Scholar 

  22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I: Foundations. (D. E. Rumelhart and J. L. McClelland, eds.), ch. 8, Cambridge, MA: Bradford Books/MIT Press, 1986.

    Google Scholar 

  23. P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, pp. 1550–1560, Oct. 1990.

    Article  Google Scholar 

  24. R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295–307, 1988.

    Article  Google Scholar 

  25. W. Schiffmann, M. Joost, and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons,” tech. rep., University of Koblenz, 1992.

    Google Scholar 

  26. T. T. Jervis and W. J. Fitzgerald, “Optimization schemes for neural networks,” Tech. Rep. CUED/F-INFENG/TR144, Cambridge University Engineering Department, Aug. 1993.

    Google Scholar 

  27. M. M. Hochberg, S. J. Renals, A. J. Robinson, and D. J. Kershaw, “Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system,” in Proc. of ICSLP-94, pp. 1499–1502, 1994.

    Google Scholar 

  28. M. M. Hochberg, G. D. Cook, S. J. Renals, and A. J. Robinson, “Connect ionist model combination for large vocabulary speech recognition,” in Neural Networks for Signal Processing IV (J. Vlontzos, J.-N. Hwang, and E. Wilson, eds.), pp. 269–278, IEEE, 1994.

    Google Scholar 

  29. T. H. Crystal and A. S. House, “Segmental durations in connected-speech signals: Current results,” J. Acoust. Soc. Am., vol. 83, pp. 1553–1573, Apr. 1988.

    Article  Google Scholar 

  30. L. R. Bahl and F. Jelinek, “Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor.” US Patent 4,748,670, May 1988.

    Google Scholar 

  31. D. B. Paul, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” in Proc. ICASSP, vol. 1, (San Francisco), pp. 25–28, 1992.

    Google Scholar 

  32. S. J. Renals and M. M. Hochberg, “Decoder technology for connectionist large vocabulary speech recognition,” Tech. Rep. Cued/Finfeng/TR.186, Cambridge University Engineering Department, 1994.

    Google Scholar 

  33. S. Renals and M. Hochberg, “Efficient search using posterior phone probability estimates,” in Proc. ICASSP, pp. 596–599, 1995.

    Google Scholar 

  34. P. S. Gopalakrishnan, D. Nahamoo, M. Padmanabhan, and M. A. Picheny, “A channel-bank-based phone detection strategy,” in Proc. ICASSP, vol. 2, (Adelaide), pp. 161–164, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Robinson, T., Hochberg, M., Renals, S. (1996). The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics