Abstract
“Democracy is the recurrent suspicion that more than half the people are right more than half the time.”—The New Yorker, July 3, 1944.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A long-short term memory network (LSTM) was used, which is a variation on the vanilla RNN discussed here.
- 2.
In the first layer, the matrix W (1) is of size 4p × (p + d) because it is multiplied with a vector of size (p + d).
- 3.
The bias associated with the forget gates is particularly important. The bias of the forget gate is generally initialized to values greater than 1 [228] because it seems to avoid the vanishing gradient problem at initialization.
- 4.
Here, we are treating the forget bits as a vector of binary bits, although it contains continuous values in (0, 1), which can be viewed as probabilities. As discussed earlier, the binary abstraction helps us understand the conceptual nature of the operations.
- 5.
In the first layer (k = 1), these matrices are of sizes 2p × (p + d) and p × (p + d).
- 6.
- 7.
In principle, one can also allow it to be input at all time-stamps, but it only seems to worsen performance.
- 8.
- 9.
Even though the adaptation from Section 2.5.7 is the most natural and obvious one, we have not seen it elsewhere in the literature. Therefore, it might be an interesting exercise for the reader to implement the adaptation of Exercise 3.
Bibliography
C. Aggarwal. Data mining: The textbook. Springer, 2015.
C. Aggarwal. Machine learning for text. Springer, 2018.
J. Ba, J. Kiros, and G. Hinton. Layer normalization. arXiv:1607.06450, 2016.https://arxiv.org/abs/1607.06450
P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11), pp. 937–946, 1999.
K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP, 2014. https://arxiv.org/pdf/1406.1078.pdf
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.https://arxiv.org/abs/1412.3555
J. Connor, R. Martin, and L. Atlas. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), pp. 240–254, 1994.
T. Cooijmans, N. Ballas, C. Laurent, C. Gulcehre, and A. Courville. Recurrent batch normalization. arXiv:1603.09025, 2016.https://arxiv.org/abs/1603.09025
T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.
G. Dorffner. Neural networks for time series processing. Neural Network World, 1996.
K. Doya. Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1, pp. 75–80, 1993.
J. Elman. Finding structure in time. Cognitive Science, 14(2), pp. 179–211, 1990.
A. Fader, L. Zettlemoyer, and O. Etzioni. Paraphrase-Driven Learning for Open Question Answering. ACL, pp. 1608–1618, 2013.
A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. ACM KDD Conference, 2014.
P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks, 9(5), pp. 768–786, 1998.
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.
Y. Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research (JAIR), 57, pp. 345–420, 2016.
C. Goller and A. Küchler. Learning task-dependent distributed representations by backpropagation through structure. Neural Networks, 1, pp. 347–352, 1996.
A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.
A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.https://arxiv.org/abs/1308.0850
A. Graves. Supervised sequence labelling with recurrent neural networks Springer, 2012.http://rd.springer.com/book/10.1007%2F978-3-642-24797-2
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML Confererence, pp. 369–376, 2006.
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE TPAMI, 31(5), pp. 855–868, 2009.
A. Graves and J. Schmidhuber. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18(5–6), pp. 602–610, 2005.
A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. NIPS Conference, pp. 545–552, 2009.
A. Graves and N. Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. ICML Conference, pp. 1764–1772, 2014.
K. Greff, R. K. Srivastava, J. Koutnik, B. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2016.http://ieeexplore.ieee.org/abstract/document/7508408/
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.
M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daume III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP, 2014.
H. Jaeger. The “echo state” approach to analysing and training recurrent neural networks – with an erratum note. German National Research Center for Information Technology GMD Technical Report, 148(34), 13, 2001.
H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304, pp. 78–80, 2004.
J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.
R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. ICML Confererence, pp. 2342–2350, 2015.
N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. EMNLP, 3, 39, pp. 413, 2013.
A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv:1506.02078, 2015.https://arxiv.org/abs/1506.02078
A. Karpathy. The unreasonable effectiveness of recurrent neural networks, Blog post, 2015.http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Y. Koren. Collaborative filtering with temporal dynamics. ACM KDD Conference, pp. 447–455, 2009.
Q. Le, N. Jaitly, and G. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.https://arxiv.org/abs/1504.00941
Q. Liao, K. Kawaguchi, and T. Poggio. Streaming normalization: Towards simpler and more biologically-plausible normalizations for online and recurrent learning. arXiv:1610.06160, 2016. https://arxiv.org/abs/1610.06160
Z. Lipton, J. Berkowitz, and C. Elkan. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019, 2015.https://arxiv.org/abs/1506.00019
M. Lukosevicius and H. Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), pp. 127–149, 2009.
W. Maass, T. Natschlager, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), pp. 2351–2560, 2002.
J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010.
J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011.
T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012.
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv:1412.7753, 2014.https://arxiv.org/abs/1412.7753
T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. Interspeech, Vol 2, 2010.
R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013.
R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012.
F. Pineda. Generalization of back-propagation to recurrent neural networks. Physical Review Letters, 59(19), 2229, 1987.
L. Polanyi and A. Zaenen. Contextual valence shifters. Computing Attitude and Affect in Text: Theory and Applications, pp. 1–10, Springer, 2006.
G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics, 47(2), pp. 228–235, 2002.
J. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1), pp. 77–105, 1990.
A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS Conference, pp. 1177–1184, 2008.
M. Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), pp. 2673–2681, 1997.
I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI, pp. 3295–3301, 2017.
I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.
H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.
R. Socher, C. Lin, C. Manning, and A. Ng. Parsing natural scenes and natural language with recursive neural networks. ICML Confererence, pp. 129–136, 2011.
R. Socher, J. Pennington, E. Huang, A. Ng, and C. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. Empirical Methods in Natural Language Processing (EMNLP), pp. 151–161, 2011.
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. Empirical Methods in Natural Language Processing (EMNLP), p. 1642, 2013.
Y. Song, A. Elkahky, and X. He. Multi-rate deep learning for temporal recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 909–912, 2016.
M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language modeling. Interspeech, 2010.
M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney. Translation modeling with bidirectional recurrent neural networks. EMNLP, pp. 14–25, 2014.
I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013.
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. NIPS Conference, pp. 3104–3112, 2014.
V. Veeriah, N. Zhuang, and G. Qi. Differential recurrent neural networks for action recognition. IEEE International Conference on Computer Vision, pp. 4041–4049, 2015.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.
S. Wang, C. Aggarwal, and H. Liu. Randomized feature engineering as a fast and accurate alternative to kernel methods. ACM KDD Conference, 2017.
P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.
J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merrienboer, A. Joulin, and T. Mikolov. Towards ai-complete question answering: A set of pre-requisite toy tasks. arXiv:1502.05698, 2015. https://arxiv.org/abs/1502.05698
J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.
C. Wu, A. Ahmed, A. Beutel, A. Smola, and H. Jing. Recurrent recommender networks. ACM International Conference on Web Search and Data Mining, pp. 495–503, 2017.
W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.
L. Zhang, C. Aggarwal, and G.-J. Qi. Stock Price Prediction via Discovering Multi-Frequency Trading Patterns. ACM KDD Conference, 2017.
S. Zhang, L. Yao, and A. Sun. Deep learning based recommender system: A survey and new perspectives. arXiv:1707.07435, 2017.https://arxiv.org/abs/1707.07435
M. Zhou, Z. Ding, J. Tang, and D. Yin. Micro Behaviors: A new perspective in e-commerce recommender systems. WSDM Conference, 2018.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Aggarwal, C.C. (2018). Recurrent Neural Networks. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-94463-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94462-3
Online ISBN: 978-3-319-94463-0
eBook Packages: Computer ScienceComputer Science (R0)