Recurrent Neural Networks

Aggarwal, Charu C.

doi:10.1007/978-3-319-94463-0_7

Charu C. Aggarwal²

429k Accesses
5 Citations
3 Altmetric

Abstract

“Democracy is the recurrent suspicion that more than half the people are right more than half the time.”—The New Yorker, July 3, 1944.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A long-short term memory network (LSTM) was used, which is a variation on the vanilla RNN discussed here.
2.
In the first layer, the matrix W ⁽¹⁾ is of size 4p × (p + d) because it is multiplied with a vector of size (p + d).
3.
The bias associated with the forget gates is particularly important. The bias of the forget gate is generally initialized to values greater than 1 [228] because it seems to avoid the vanishing gradient problem at initialization.
4.
Here, we are treating the forget bits as a vector of binary bits, although it contains continuous values in (0, 1), which can be viewed as probabilities. As discussed earlier, the binary abstraction helps us understand the conceptual nature of the operations.
5.
In the first layer (k = 1), these matrices are of sizes 2p × (p + d) and p × (p + d).
6.
https://www.nasa.gov/mission_pages/chandra/cosmic-winter-wonderland.html
7.
In principle, one can also allow it to be input at all time-stamps, but it only seems to worsen performance.
8.
The original work in [478] seems to use this option. In the Google Neural Machine Translation system [579], this weight is removed. This system is now used in Google Translate.
9.
Even though the adaptation from Section 2.5.7 is the most natural and obvious one, we have not seen it elsewhere in the literature. Therefore, it might be an interesting exercise for the reader to implement the adaptation of Exercise 3.

Bibliography

C. Aggarwal. Data mining: The textbook. Springer, 2015.
Google Scholar
C. Aggarwal. Machine learning for text. Springer, 2018.
Google Scholar
J. Ba, J. Kiros, and G. Hinton. Layer normalization. arXiv:1607.06450, 2016.https://arxiv.org/abs/1607.06450
P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11), pp. 937–946, 1999.
Article Google Scholar
K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP, 2014. https://arxiv.org/pdf/1406.1078.pdf
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.https://arxiv.org/abs/1412.3555
J. Connor, R. Martin, and L. Atlas. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), pp. 240–254, 1994.
Article Google Scholar
T. Cooijmans, N. Ballas, C. Laurent, C. Gulcehre, and A. Courville. Recurrent batch normalization. arXiv:1603.09025, 2016.https://arxiv.org/abs/1603.09025
T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.
Google Scholar
G. Dorffner. Neural networks for time series processing. Neural Network World, 1996.
Google Scholar
K. Doya. Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1, pp. 75–80, 1993.
Google Scholar
J. Elman. Finding structure in time. Cognitive Science, 14(2), pp. 179–211, 1990.
Article Google Scholar
A. Fader, L. Zettlemoyer, and O. Etzioni. Paraphrase-Driven Learning for Open Question Answering. ACL, pp. 1608–1618, 2013.
Google Scholar
A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. ACM KDD Conference, 2014.
Google Scholar
P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks, 9(5), pp. 768–786, 1998.
Article Google Scholar
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.
Google Scholar
Y. Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research (JAIR), 57, pp. 345–420, 2016.
Article MathSciNet Google Scholar
C. Goller and A. Küchler. Learning task-dependent distributed representations by backpropagation through structure. Neural Networks, 1, pp. 347–352, 1996.
Google Scholar
A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.
Google Scholar
A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.https://arxiv.org/abs/1308.0850
A. Graves. Supervised sequence labelling with recurrent neural networks Springer, 2012.http://rd.springer.com/book/10.1007%2F978-3-642-24797-2
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML Confererence, pp. 369–376, 2006.
Google Scholar
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE TPAMI, 31(5), pp. 855–868, 2009.
Article Google Scholar
A. Graves and J. Schmidhuber. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18(5–6), pp. 602–610, 2005.
Article Google Scholar
A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. NIPS Conference, pp. 545–552, 2009.
Google Scholar
A. Graves and N. Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. ICML Conference, pp. 1764–1772, 2014.
Google Scholar
K. Greff, R. K. Srivastava, J. Koutnik, B. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2016.http://ieeexplore.ieee.org/abstract/document/7508408/
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.
Article Google Scholar
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.
Google Scholar
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.
Google Scholar
M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daume III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP, 2014.
Google Scholar
H. Jaeger. The “echo state” approach to analysing and training recurrent neural networks – with an erratum note. German National Research Center for Information Technology GMD Technical Report, 148(34), 13, 2001.
Google Scholar
H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304, pp. 78–80, 2004.
Article Google Scholar
J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.
Google Scholar
R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. ICML Confererence, pp. 2342–2350, 2015.
Google Scholar
N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. EMNLP, 3, 39, pp. 413, 2013.
Google Scholar
A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv:1506.02078, 2015.https://arxiv.org/abs/1506.02078
A. Karpathy. The unreasonable effectiveness of recurrent neural networks, Blog post, 2015.http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Y. Koren. Collaborative filtering with temporal dynamics. ACM KDD Conference, pp. 447–455, 2009.
Google Scholar
Q. Le, N. Jaitly, and G. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.https://arxiv.org/abs/1504.00941
Q. Liao, K. Kawaguchi, and T. Poggio. Streaming normalization: Towards simpler and more biologically-plausible normalizations for online and recurrent learning. arXiv:1610.06160, 2016. https://arxiv.org/abs/1610.06160
Z. Lipton, J. Berkowitz, and C. Elkan. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019, 2015.https://arxiv.org/abs/1506.00019
M. Lukosevicius and H. Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), pp. 127–149, 2009.
Article Google Scholar
W. Maass, T. Natschlager, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), pp. 2351–2560, 2002.
Article Google Scholar
J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010.
Google Scholar
J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011.
Google Scholar
T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012.
Google Scholar
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv:1412.7753, 2014.https://arxiv.org/abs/1412.7753
T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. Interspeech, Vol 2, 2010.
Google Scholar
R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013.
Google Scholar
R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012.
Google Scholar
F. Pineda. Generalization of back-propagation to recurrent neural networks. Physical Review Letters, 59(19), 2229, 1987.
Google Scholar
L. Polanyi and A. Zaenen. Contextual valence shifters. Computing Attitude and Affect in Text: Theory and Applications, pp. 1–10, Springer, 2006.
Google Scholar
G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics, 47(2), pp. 228–235, 2002.
Article Google Scholar
J. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1), pp. 77–105, 1990.
Article Google Scholar
A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS Conference, pp. 1177–1184, 2008.
Google Scholar
M. Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), pp. 2673–2681, 1997.
Article Google Scholar
I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI, pp. 3295–3301, 2017.
Google Scholar
I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.
Google Scholar
H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.
Article MathSciNet Google Scholar
R. Socher, C. Lin, C. Manning, and A. Ng. Parsing natural scenes and natural language with recursive neural networks. ICML Confererence, pp. 129–136, 2011.
Google Scholar
R. Socher, J. Pennington, E. Huang, A. Ng, and C. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. Empirical Methods in Natural Language Processing (EMNLP), pp. 151–161, 2011.
Google Scholar
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. Empirical Methods in Natural Language Processing (EMNLP), p. 1642, 2013.
Google Scholar
Y. Song, A. Elkahky, and X. He. Multi-rate deep learning for temporal recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 909–912, 2016.
Google Scholar
M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language modeling. Interspeech, 2010.
Google Scholar
M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney. Translation modeling with bidirectional recurrent neural networks. EMNLP, pp. 14–25, 2014.
Google Scholar
I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013.
Google Scholar
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. NIPS Conference, pp. 3104–3112, 2014.
Google Scholar
V. Veeriah, N. Zhuang, and G. Qi. Differential recurrent neural networks for action recognition. IEEE International Conference on Computer Vision, pp. 4041–4049, 2015.
Google Scholar
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.
Google Scholar
S. Wang, C. Aggarwal, and H. Liu. Randomized feature engineering as a fast and accurate alternative to kernel methods. ACM KDD Conference, 2017.
Google Scholar
P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.
Article Google Scholar
J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merrienboer, A. Joulin, and T. Mikolov. Towards ai-complete question answering: A set of pre-requisite toy tasks. arXiv:1502.05698, 2015. https://arxiv.org/abs/1502.05698
J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.
Google Scholar
C. Wu, A. Ahmed, A. Beutel, A. Smola, and H. Jing. Recurrent recommender networks. ACM International Conference on Web Search and Data Mining, pp. 495–503, 2017.
Google Scholar
W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.
Google Scholar
L. Zhang, C. Aggarwal, and G.-J. Qi. Stock Price Prediction via Discovering Multi-Frequency Trading Patterns. ACM KDD Conference, 2017.
Google Scholar
S. Zhang, L. Yao, and A. Sun. Deep learning based recommender system: A survey and new perspectives. arXiv:1707.07435, 2017.https://arxiv.org/abs/1707.07435
M. Zhou, Z. Ding, J. Tang, and D. Yin. Micro Behaviors: A new perspective in e-commerce recommender systems. WSDM Conference, 2018.
Google Scholar
http://caffe.berkeleyvision.org/
http://torch.ch/
http://deeplearning.net/software/theano/
https://www.tensorflow.org/
http://deeplearning.net/tutorial/lstm.html
https://arxiv.org/abs/1609.08144
https://github.com/karpathy/char-rnn
https://deeplearning4j.org/lstm
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, International Business Machines, Yorktown Heights, NY, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2018). Recurrent Neural Networks. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-94463-0_7
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94462-3
Online ISBN: 978-3-319-94463-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics