Recurrent Neural Networks and Related Models

Yu, Dong; Deng, Li

doi:10.1007/978-1-4471-5779-3_13

Dong Yu³ &
Li Deng⁴

Part of the book series: Signals and Communication Technology ((SCT))

13k Accesses
3 Citations

Abstract

A recurrent neural network (RNN) is a class of neural network models where many connections among its neurons form a directed cycle. This gives rise to the structure of internal states or memory in the RNN, endowing it with the dynamic temporal behavior not exhibited by the DNN discussed in earlier chapters. In this chapter, we first present the state-space formulation of the basic RNN as a nonlinear dynamical system, where the recurrent matrix governing the system dynamics is largely unstructured. For such basic RNNs, we describe two algorithms for learning their parameters in some detail: (1) the most popular algorithm of backpropagation through time (BPTT); and (2) a more rigorous, primal-dual optimization technique, where constraints on the RNN’s recurrent matrix are imposed to guarantee stability during RNN learning. Going beyond basic RNNs, we further study an advanced version of the RNN, which exploits the structure called long-short-term memory (LSTM), and analyzes its strengths over the basic RNN both in terms of model construction and of practical applications including some latest speech recognition results. Finally, we analyze the RNN as a bottom-up, discriminative, dynamic system model against the top-down, generative counterpart of dynamic system as discussed in Chap. 4. The analysis and discussion lead to potentially more effective and advanced RNN-like architectures and learning paradigm where the strengths of discriminative and generative modeling are integrated while their respective weaknesses are overcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Most of the contrasts discussed here can be generalized to the differences in learning general deep generative models (i.e., those with latent variables) and in learning deep discriminative models with neural network architectures.
2.
With less careful engineering, the basic RNN accepting inputs of raw speech features could only achieve 71.8 % accuracy as reported in [21] before using DNN-derived input features.
3.
For example, in [15], second-order dynamics with critical damping were used to incorporate such constraints.

References

Bazzi, I., Acero, A., Deng, L.: An expectation-maximization approach for formant tracking using a parameter-free nonlinear predictor. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2003)
Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MATH MathSciNet Google Scholar
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks. Tricks of the Trade, pp. 437–478. Springer (2012)
Google Scholar
Bengio, Y.: Estimating or propagating gradients through stochastic neurons. CoRR (2013)
Google Scholar
Bengio, Y., Boulanger, N., Pascanu, R.: Advances in optimizing recurrent networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Boden, M.: A guide to recurrent neural networks and backpropagation. Technical Report T2002:03, SICS (2002)
Google Scholar
Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
Google Scholar
Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R.: An investigation fo segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP, Johns Hopkins (1998)
Google Scholar
Chen, J., Deng, L.: A primal-dual method for training recurrent neural networks constrained by the echo-state property. In: Proceeding of the ICLR (2014)
Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Danilo Jimenez Rezende Shakir Mohamed, D.W.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)
Google Scholar
Deng, L.: A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)
Article Google Scholar
Deng, L.: Computational models for speech production. In: Computational Models of Speech Pattern Processing, pp. 199–213. Springer, New York (1999)
Google Scholar
Deng, L.: Switching dynamic system models for speech articulation and acoustics. In: Mathematical Foundations of Speech and Language Processing, pp. 115–134. Springer, New York (2003)
Google Scholar
Deng, L.: Dyamic Speech Models—Theory, Algorithm, and Applications. Morgan and Claypool (2006)
Google Scholar
Deng, L., Attias, H., Lee, L., Acero, A.: Adaptive kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15, 13–23 (2007)
Article Google Scholar
Deng, L., Bazzi, I., Acero, A.: Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2003)
Google Scholar
Deng, L., Chen, J.: Sequence classification using high-level features extracted from deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
Google Scholar
Deng, L., Hassanein, K., Elmasry, M.: Analysis of correlation structure for a neural predictive model with application to speech recognition. Neural Netw. 7, 331–339 (1994)
Article Google Scholar
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: An overview. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Deng, L., Hinton, G., Yu, D.: Deep learning for speech recognition and related applications. In: NIPS Workshop. Whistler, Canada (2009)
Google Scholar
Deng, L., Lee, L., Attias, H., Acero, A.: Adaptive kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15(1), 13–23 (2007)
Article Google Scholar
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at microsoft. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Deng, L., Li, X.: Machine learning paradigms in speech recognition: An overview. IEEE Trans. Audio, Speech Lang. Process. 21(5), 1060–1089 (2013)
Article Google Scholar
Deng, L., Ma, J.: Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. J. Acoust. Soc. Am. 108, 3036–3048 (2000)
Article Google Scholar
Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, NY (2003)
Google Scholar
Deng, L., Ramsay, G., Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)
Article Google Scholar
Deng, L., Togneri, R.: Deep dynamic models for learning hidden representations of speech features. In: Speech and Audio Processing for Coding, Enhancement and Recognition. Springer (2014)
Google Scholar
Deng, L., Yu, D.: Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448 (2007)
Google Scholar
Deng, L., Yu, D., Acero, A.: A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)
Article Google Scholar
Deng, L., Yu, D., Acero, A.: Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)
Article Google Scholar
Divenyi, P., Greenberg, S., Meyer, G.: Dynamics of Speech Production and Perception. IOS Press (2006)
Google Scholar
Fan, Y., Qian, Y., Xie, F., Soong, F.K.: TTS synthesis with bidirectional lstm based recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Geiger, J., Zhang, Z., Weninger, F., Schuller, B., Rigoll, G.: Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with lstm. Neural Comput. 12, 2451–2471 (2000)
Article Google Scholar
Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)
MathSciNet Google Scholar
Ghahramani, Z., Hinton, G.E.: Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)
Article Google Scholar
Gonzalez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using long short-term memory recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Graves, A.: Sequence transduction with recurrent neural networks. In: ICML Representation Learning Workshop (2012)
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. arXvi preprint. arXiv:1308.0850 (2013)
Graves, A., Jaitly, N., Mahamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Graves, A., Mahamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
Google Scholar
Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Google Scholar
Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2013)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E.: Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MATH MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference
Google Scholar
Jaeger, H.: Short term memory in echo state networks. GMD Report 152,GMD—German National Research Institute for Computer Science (2001)
Google Scholar
Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159, GMD—German National Research Institute for Computer Science (2002)
Google Scholar
Jordan, M., Sudderth, E., Wainwright, M., Wilsky, A.: Major advances and emerging developments of graphical models, special issue. IEEE Signal Process. Mag. 27(6), 17,138 (2010)
Google Scholar
Kingma, D., Welling, M.: Auto-encoding variational bayes. In: arXiv:1312.6114v10 (2014)
Kingma, D., Welling, M.: Efficient gradient-based inference through transformations between bayes nets and neural nets. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)
Google Scholar
Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Google Scholar
Lee, L., Attias, H., Deng, L.: Variational inference and learning for segmental switching state space models of hidden speech dynamics. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-872–I-875 (2003)
Google Scholar
Ma, J., Deng, L.: A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)
Google Scholar
Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Process. 11(6), 590–602 (2003)
Google Scholar
Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio, Speech Lang. Process. 11(6), 590–602 (2004)
Google Scholar
Ma, J., Deng, L.: Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)
Article Google Scholar
Maas, A.L., Le, Q., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust asr. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Portland, OR (2012)
Google Scholar
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Lyon, France (2013)
Google Scholar
Mikolov, T.: Rnntoolkit http://www.fit.vutbr.cz/imikolov/rnnlm/ (2012). http://www.fit.vutbr.cz/~imikolov/rnnlm/
Mikolov, T.: Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology (2012)
Google Scholar
Mikolov, T., Deoras, A., Povey, D., Burget, L., Cernocky, J.: Strategies for training large scale neural network language models. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE, Honolulu, HI (2011)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1045–1048. Makuhari, Japan (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. Prague, Czech (2011)
Google Scholar
Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proceedings of the IEEE Spoken Language Technology Workshop (SLT), pp. 234–239 (2012)
Google Scholar
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)
Google Scholar
Mohamed, A.r., Dahl, G.E., Hinton, G.: Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)
Google Scholar
Ozkan, E., Ozbek, I., Demirekler, M.: Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying dirichlet process mixture models. IEEE Trans. Audio, Speech Lang. Process. 17(8), 1518–1532 (2009)
Article Google Scholar
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. In: The 2nd International Conference on Learning Representation (ICLR) (2014)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning (ICML). Atlanta, GA (2013)
Google Scholar
Pavlovic, V., Frey, B., Huang, T.: Variational learning in mixed-state dynamic graphical models. In: UAI, pp. 522–530. Stockholm (1999)
Google Scholar
Picone, J., Pike, S., Regan, R., Kamm, T., bridle, J., Deng, L., Ma, Z., Richards, H., Schuster, M.: Initial evaluation of hidden dynamic models on conversational speech. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1999)
Google Scholar
Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)
Article Google Scholar
Robinson, A.J., Cook, G., Ellis, D.P., Fosler-Lussier, E., Renals, S., Williams, D.: Connectionist speech recognition of broadcast news. Speech Commun. 37(1), 27–45 (2002)
Article MATH Google Scholar
Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Sainath, T., Kingsbury, B., Soltau, H., Ramabhadran, B.: Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans. Audio, Speech, Lang. Process. 21(11), 2267–2276 (2013)
Article Google Scholar
Sainath, T.N., Kingsbury, B., Mohamed, A.r., Dahl, G.E., Saon, G., Soltau, H., Beran, T., Aravkin, A.Y., Ramabhadran, B.: Improvements to deep convolutional neural networks for lvcsr. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 315–320 (2013)
Google Scholar
Sainath, T.N., Kingsbury, B., Mohamed, A.r., Ramabhadran, B.: Learning filter banks within a deep neural network framework. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU) (2013)
Google Scholar
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659 (2013)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., Mao, M.: Sequence discriminative distributed training of long short-term memory recurrent neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. CoRR abs/1404.7828 (2014)
Google Scholar
Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech dnns. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)
Google Scholar
Shen, X., Deng, L.: Maximum likelihood in statistical estimation of dynamical systems: Decomposition algorithm and simulation results. Signal Process. 57, 65–79 (1997)
Article MATH Google Scholar
Stevens, K.: Acoustic Phonetics. MIT Press (2000)
Google Scholar
Stoyanov, V., Ropson, A., Eisner, J.: Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)
Google Scholar
Sutskever, I.: Training Recurrent Neural Networks. Ph.D. thesis, University of Toronto (2013)
Google Scholar
Togneri, R., Deng, L.: Joint state and parameter estimation for a target-directed nonlinear dynamic system model. IEEE Trans. Signal Process. 51(12), 3061–3070 (2003)
Article MathSciNet Google Scholar
Togneri, R., Deng, L.: A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from mel-cepstral coefficients. Speech Commun. 48(8), 971–988 (2006)
Article Google Scholar
Triefenbach, F., Jalalvand, A., Demuynck, K., Martens, J.P.: Acoustic modeling with hierarchical reservoirs. IEEE Trans. Audio, Speech, Lang. Process. 21(11), 2439–2450 (2013)
Article Google Scholar
Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling
Google Scholar
Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Trans. Speech Audio Process. 37(3), 328–339 (1989)
Article Google Scholar
Weninger, F., Geiger, J., Wollmer, M., Schuller, B., Rigoll, G.: Feature enhancement by deep lstm networks for ASR in reverberant multisource environments. Comput. Speech and Lang. 888–902 (2014)
Google Scholar
Xing, E., Jordan, M., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the Uncertainty in Artificial Intelligence (2003)
Google Scholar
Yu, D., Deng, L.: Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)
Article Google Scholar
Yu, D., Deng, L.: Deep-structured hidden conditional random fields for phonetic recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010)
Google Scholar
Yu, D., Deng, L., Acero, A.: A lattice search technique for a long-contextual-span hidden trajectory model of speech. Speech Commun. 48, 1214–1226 (2006)
Article Google Scholar
Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Bothell, USA
Dong Yu
Microsoft Research, Redmond, WA, USA
Li Deng

Authors

Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Recurrent Neural Networks and Related Models. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_13

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5779-3_13
Published: 12 November 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5778-6
Online ISBN: 978-1-4471-5779-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics