Skip to main content

Recurrent Neural Networks and Related Models

  • Chapter
  • First Online:
Automatic Speech Recognition

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

A recurrent neural network (RNN) is a class of neural network models where many connections among its neurons form a directed cycle. This gives rise to the structure of internal states or memory in the RNN, endowing it with the dynamic temporal behavior not exhibited by the DNN discussed in earlier chapters. In this chapter, we first present the state-space formulation of the basic RNN as a nonlinear dynamical system, where the recurrent matrix governing the system dynamics is largely unstructured. For such basic RNNs, we describe two algorithms for learning their parameters in some detail: (1) the most popular algorithm of backpropagation through time (BPTT); and (2) a more rigorous, primal-dual optimization technique, where constraints on the RNN’s recurrent matrix are imposed to guarantee stability during RNN learning. Going beyond basic RNNs, we further study an advanced version of the RNN, which exploits the structure called long-short-term memory (LSTM), and analyzes its strengths over the basic RNN both in terms of model construction and of practical applications including some latest speech recognition results. Finally, we analyze the RNN as a bottom-up, discriminative, dynamic system model against the top-down, generative counterpart of dynamic system as discussed in Chap. 4. The analysis and discussion lead to potentially more effective and advanced RNN-like architectures and learning paradigm where the strengths of discriminative and generative modeling are integrated while their respective weaknesses are overcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Most of the contrasts discussed here can be generalized to the differences in learning general deep generative models (i.e., those with latent variables) and in learning deep discriminative models with neural network architectures.

  2. 2.

    With less careful engineering, the basic RNN accepting inputs of raw speech features could only achieve 71.8 % accuracy as reported in [21] before using DNN-derived input features.

  3. 3.

    For example, in [15], second-order dynamics with critical damping were used to incorporate such constraints.

References

  1. Bazzi, I., Acero, A., Deng, L.: An expectation-maximization approach for formant tracking using a parameter-free nonlinear predictor. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2003)

    Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks. Tricks of the Trade, pp. 437–478. Springer (2012)

    Google Scholar 

  4. Bengio, Y.: Estimating or propagating gradients through stochastic neurons. CoRR (2013)

    Google Scholar 

  5. Bengio, Y., Boulanger, N., Pascanu, R.: Advances in optimizing recurrent networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  6. Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  7. Boden, M.: A guide to recurrent neural networks and backpropagation. Technical Report T2002:03, SICS (2002)

    Google Scholar 

  8. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)

    Google Scholar 

  9. Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R.: An investigation fo segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP, Johns Hopkins (1998)

    Google Scholar 

  10. Chen, J., Deng, L.: A primal-dual method for training recurrent neural networks constrained by the echo-state property. In: Proceeding of the ICLR (2014)

    Google Scholar 

  11. Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)

    Google Scholar 

  12. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)

    Google Scholar 

  13. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  14. Danilo Jimenez Rezende Shakir Mohamed, D.W.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  15. Deng, L.: A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)

    Article  Google Scholar 

  16. Deng, L.: Computational models for speech production. In: Computational Models of Speech Pattern Processing, pp. 199–213. Springer, New York (1999)

    Google Scholar 

  17. Deng, L.: Switching dynamic system models for speech articulation and acoustics. In: Mathematical Foundations of Speech and Language Processing, pp. 115–134. Springer, New York (2003)

    Google Scholar 

  18. Deng, L.: Dyamic Speech Models—Theory, Algorithm, and Applications. Morgan and Claypool (2006)

    Google Scholar 

  19. Deng, L., Attias, H., Lee, L., Acero, A.: Adaptive kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15, 13–23 (2007)

    Article  Google Scholar 

  20. Deng, L., Bazzi, I., Acero, A.: Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2003)

    Google Scholar 

  21. Deng, L., Chen, J.: Sequence classification using high-level features extracted from deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)

    Google Scholar 

  22. Deng, L., Hassanein, K., Elmasry, M.: Analysis of correlation structure for a neural predictive model with application to speech recognition. Neural Netw. 7, 331–339 (1994)

    Article  Google Scholar 

  23. Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: An overview. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  24. Deng, L., Hinton, G., Yu, D.: Deep learning for speech recognition and related applications. In: NIPS Workshop. Whistler, Canada (2009)

    Google Scholar 

  25. Deng, L., Lee, L., Attias, H., Acero, A.: Adaptive kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15(1), 13–23 (2007)

    Article  Google Scholar 

  26. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at microsoft. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  27. Deng, L., Li, X.: Machine learning paradigms in speech recognition: An overview. IEEE Trans. Audio, Speech Lang. Process. 21(5), 1060–1089 (2013)

    Article  Google Scholar 

  28. Deng, L., Ma, J.: Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. J. Acoust. Soc. Am. 108, 3036–3048 (2000)

    Article  Google Scholar 

  29. Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, NY (2003)

    Google Scholar 

  30. Deng, L., Ramsay, G., Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)

    Article  Google Scholar 

  31. Deng, L., Togneri, R.: Deep dynamic models for learning hidden representations of speech features. In: Speech and Audio Processing for Coding, Enhancement and Recognition. Springer (2014)

    Google Scholar 

  32. Deng, L., Yu, D.: Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448 (2007)

    Google Scholar 

  33. Deng, L., Yu, D., Acero, A.: A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)

    Article  Google Scholar 

  34. Deng, L., Yu, D., Acero, A.: Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)

    Article  Google Scholar 

  35. Divenyi, P., Greenberg, S., Meyer, G.: Dynamics of Speech Production and Perception. IOS Press (2006)

    Google Scholar 

  36. Fan, Y., Qian, Y., Xie, F., Soong, F.K.: TTS synthesis with bidirectional lstm based recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  37. Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  38. Geiger, J., Zhang, Z., Weninger, F., Schuller, B., Rigoll, G.: Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  39. Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with lstm. Neural Comput. 12, 2451–2471 (2000)

    Article  Google Scholar 

  40. Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)

    MathSciNet  Google Scholar 

  41. Ghahramani, Z., Hinton, G.E.: Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)

    Article  Google Scholar 

  42. Gonzalez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using long short-term memory recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  43. Graves, A.: Sequence transduction with recurrent neural networks. In: ICML Representation Learning Workshop (2012)

    Google Scholar 

  44. Graves, A.: Generating sequences with recurrent neural networks. arXvi preprint. arXiv:1308.0850 (2013)

  45. Graves, A., Jaitly, N., Mahamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  46. Graves, A., Mahamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)

    Google Scholar 

  47. Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

    Google Scholar 

  48. Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2013)

    Google Scholar 

  49. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  50. Hinton, G., Deng, L., Yu, D., Dahl, G.E.: Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  51. Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  52. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  53. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  54. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference

    Google Scholar 

  55. Jaeger, H.: Short term memory in echo state networks. GMD Report 152,GMD—German National Research Institute for Computer Science (2001)

    Google Scholar 

  56. Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159, GMD—German National Research Institute for Computer Science (2002)

    Google Scholar 

  57. Jordan, M., Sudderth, E., Wainwright, M., Wilsky, A.: Major advances and emerging developments of graphical models, special issue. IEEE Signal Process. Mag. 27(6), 17,138 (2010)

    Google Scholar 

  58. Kingma, D., Welling, M.: Auto-encoding variational bayes. In: arXiv:1312.6114v10 (2014)

  59. Kingma, D., Welling, M.: Efficient gradient-based inference through transformations between bayes nets and neural nets. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  60. Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)

    Google Scholar 

  61. Lee, L., Attias, H., Deng, L.: Variational inference and learning for segmental switching state space models of hidden speech dynamics. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-872–I-875 (2003)

    Google Scholar 

  62. Ma, J., Deng, L.: A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)

    Google Scholar 

  63. Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Process. 11(6), 590–602 (2003)

    Google Scholar 

  64. Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio, Speech Lang. Process. 11(6), 590–602 (2004)

    Google Scholar 

  65. Ma, J., Deng, L.: Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)

    Article  Google Scholar 

  66. Maas, A.L., Le, Q., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust asr. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Portland, OR (2012)

    Google Scholar 

  67. Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Lyon, France (2013)

    Google Scholar 

  68. Mikolov, T.: Rnntoolkit http://www.fit.vutbr.cz/imikolov/rnnlm/ (2012). http://www.fit.vutbr.cz/~imikolov/rnnlm/

  69. Mikolov, T.: Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology (2012)

    Google Scholar 

  70. Mikolov, T., Deoras, A., Povey, D., Burget, L., Cernocky, J.: Strategies for training large scale neural network language models. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE, Honolulu, HI (2011)

    Google Scholar 

  71. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1045–1048. Makuhari, Japan (2010)

    Google Scholar 

  72. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. Prague, Czech (2011)

    Google Scholar 

  73. Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proceedings of the IEEE Spoken Language Technology Workshop (SLT), pp. 234–239 (2012)

    Google Scholar 

  74. Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  75. Mohamed, A.r., Dahl, G.E., Hinton, G.: Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)

    Google Scholar 

  76. Ozkan, E., Ozbek, I., Demirekler, M.: Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying dirichlet process mixture models. IEEE Trans. Audio, Speech Lang. Process. 17(8), 1518–1532 (2009)

    Article  Google Scholar 

  77. Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. In: The 2nd International Conference on Learning Representation (ICLR) (2014)

    Google Scholar 

  78. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning (ICML). Atlanta, GA (2013)

    Google Scholar 

  79. Pavlovic, V., Frey, B., Huang, T.: Variational learning in mixed-state dynamic graphical models. In: UAI, pp. 522–530. Stockholm (1999)

    Google Scholar 

  80. Picone, J., Pike, S., Regan, R., Kamm, T., bridle, J., Deng, L., Ma, Z., Richards, H., Schuster, M.: Initial evaluation of hidden dynamic models on conversational speech. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1999)

    Google Scholar 

  81. Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)

    Article  Google Scholar 

  82. Robinson, A.J., Cook, G., Ellis, D.P., Fosler-Lussier, E., Renals, S., Williams, D.: Connectionist speech recognition of broadcast news. Speech Commun. 37(1), 27–45 (2002)

    Article  MATH  Google Scholar 

  83. Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  84. Sainath, T., Kingsbury, B., Soltau, H., Ramabhadran, B.: Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans. Audio, Speech, Lang. Process. 21(11), 2267–2276 (2013)

    Article  Google Scholar 

  85. Sainath, T.N., Kingsbury, B., Mohamed, A.r., Dahl, G.E., Saon, G., Soltau, H., Beran, T., Aravkin, A.Y., Ramabhadran, B.: Improvements to deep convolutional neural networks for lvcsr. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 315–320 (2013)

    Google Scholar 

  86. Sainath, T.N., Kingsbury, B., Mohamed, A.r., Ramabhadran, B.: Learning filter banks within a deep neural network framework. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU) (2013)

    Google Scholar 

  87. Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659 (2013)

    Google Scholar 

  88. Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  89. Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., Mao, M.: Sequence discriminative distributed training of long short-term memory recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)

    Google Scholar 

  90. Schmidhuber, J.: Deep learning in neural networks: an overview. CoRR abs/1404.7828 (2014)

    Google Scholar 

  91. Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech dnns. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)

    Google Scholar 

  92. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)

    Google Scholar 

  93. Shen, X., Deng, L.: Maximum likelihood in statistical estimation of dynamical systems: Decomposition algorithm and simulation results. Signal Process. 57, 65–79 (1997)

    Article  MATH  Google Scholar 

  94. Stevens, K.: Acoustic Phonetics. MIT Press (2000)

    Google Scholar 

  95. Stoyanov, V., Ropson, A., Eisner, J.: Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)

    Google Scholar 

  96. Sutskever, I.: Training Recurrent Neural Networks. Ph.D. thesis, University of Toronto (2013)

    Google Scholar 

  97. Togneri, R., Deng, L.: Joint state and parameter estimation for a target-directed nonlinear dynamic system model. IEEE Trans. Signal Process. 51(12), 3061–3070 (2003)

    Article  MathSciNet  Google Scholar 

  98. Togneri, R., Deng, L.: A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from mel-cepstral coefficients. Speech Commun. 48(8), 971–988 (2006)

    Article  Google Scholar 

  99. Triefenbach, F., Jalalvand, A., Demuynck, K., Martens, J.P.: Acoustic modeling with hierarchical reservoirs. IEEE Trans. Audio, Speech, Lang. Process. 21(11), 2439–2450 (2013)

    Article  Google Scholar 

  100. Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling

    Google Scholar 

  101. Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  102. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Trans. Speech Audio Process. 37(3), 328–339 (1989)

    Article  Google Scholar 

  103. Weninger, F., Geiger, J., Wollmer, M., Schuller, B., Rigoll, G.: Feature enhancement by deep lstm networks for ASR in reverberant multisource environments. Comput. Speech and Lang. 888–902 (2014)

    Google Scholar 

  104. Xing, E., Jordan, M., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the Uncertainty in Artificial Intelligence (2003)

    Google Scholar 

  105. Yu, D., Deng, L.: Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)

    Article  Google Scholar 

  106. Yu, D., Deng, L.: Deep-structured hidden conditional random fields for phonetic recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010)

    Google Scholar 

  107. Yu, D., Deng, L., Acero, A.: A lattice search technique for a long-contextual-span hidden trajectory model of speech. Speech Commun. 48, 1214–1226 (2006)

    Article  Google Scholar 

  108. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Recurrent Neural Networks and Related Models. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5779-3_13

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5778-6

  • Online ISBN: 978-1-4471-5779-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics