Skip to main content

Recurrent Neural Networks

Abstract

“Democracy is the recurrent suspicion that more than half the people are right more than half the time.”—The New Yorker, July 3, 1944.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-94463-0_7
  • Chapter length: 43 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-94463-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Hardcover Book
USD   69.99
Price excludes VAT (USA)
Figure 7.1
Figure 7.2
Figure 7.3
Figure 7.4
Figure 7.5
Figure 7.6
Figure 7.7
Figure 7.8
Figure 7.9
Figure 7.10
Figure 7.11
Figure 7.12
Figure 7.13

Notes

  1. 1.

    A long-short term memory network (LSTM) was used, which is a variation on the vanilla RNN discussed here.

  2. 2.

    In the first layer, the matrix W (1) is of size 4p × (p + d) because it is multiplied with a vector of size (p + d).

  3. 3.

    The bias associated with the forget gates is particularly important. The bias of the forget gate is generally initialized to values greater than 1 [228] because it seems to avoid the vanishing gradient problem at initialization.

  4. 4.

    Here, we are treating the forget bits as a vector of binary bits, although it contains continuous values in (0, 1), which can be viewed as probabilities. As discussed earlier, the binary abstraction helps us understand the conceptual nature of the operations.

  5. 5.

    In the first layer (k = 1), these matrices are of sizes 2p × (p + d) and p × (p + d).

  6. 6.

    https://www.nasa.gov/mission_pages/chandra/cosmic-winter-wonderland.html

  7. 7.

    In principle, one can also allow it to be input at all time-stamps, but it only seems to worsen performance.

  8. 8.

    The original work in [478] seems to use this option. In the Google Neural Machine Translation system [579], this weight is removed. This system is now used in Google Translate.

  9. 9.

    Even though the adaptation from Section 2.5.7 is the most natural and obvious one, we have not seen it elsewhere in the literature. Therefore, it might be an interesting exercise for the reader to implement the adaptation of Exercise 3.

Bibliography

  1. C. Aggarwal. Data mining: The textbook. Springer, 2015.

    Google Scholar 

  2. C. Aggarwal. Machine learning for text. Springer, 2018.

    Google Scholar 

  3. J. Ba, J. Kiros, and G. Hinton. Layer normalization. arXiv:1607.06450, 2016.https://arxiv.org/abs/1607.06450

  4. P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11), pp. 937–946, 1999.

    CrossRef  Google Scholar 

  5. K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP, 2014. https://arxiv.org/pdf/1406.1078.pdf

  6. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.https://arxiv.org/abs/1412.3555

  7. J. Connor, R. Martin, and L. Atlas. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), pp. 240–254, 1994.

    CrossRef  Google Scholar 

  8. T. Cooijmans, N. Ballas, C. Laurent, C. Gulcehre, and A. Courville. Recurrent batch normalization. arXiv:1603.09025, 2016.https://arxiv.org/abs/1603.09025

  9. T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.

    Google Scholar 

  10. G. Dorffner. Neural networks for time series processing. Neural Network World, 1996.

    Google Scholar 

  11. K. Doya. Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1, pp. 75–80, 1993.

    Google Scholar 

  12. J. Elman. Finding structure in time. Cognitive Science, 14(2), pp. 179–211, 1990.

    CrossRef  Google Scholar 

  13. A. Fader, L. Zettlemoyer, and O. Etzioni. Paraphrase-Driven Learning for Open Question Answering. ACL, pp. 1608–1618, 2013.

    Google Scholar 

  14. A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. ACM KDD Conference, 2014.

    Google Scholar 

  15. P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks, 9(5), pp. 768–786, 1998.

    CrossRef  Google Scholar 

  16. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.

    Google Scholar 

  17. Y. Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research (JAIR), 57, pp. 345–420, 2016.

    MathSciNet  CrossRef  Google Scholar 

  18. C. Goller and A. Küchler. Learning task-dependent distributed representations by backpropagation through structure. Neural Networks, 1, pp. 347–352, 1996.

    Google Scholar 

  19. A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.

    Google Scholar 

  20. A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.https://arxiv.org/abs/1308.0850

  21. A. Graves. Supervised sequence labelling with recurrent neural networks Springer, 2012.http://rd.springer.com/book/10.1007%2F978-3-642-24797-2

  22. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML Confererence, pp. 369–376, 2006.

    Google Scholar 

  23. A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE TPAMI, 31(5), pp. 855–868, 2009.

    CrossRef  Google Scholar 

  24. A. Graves and J. Schmidhuber. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18(5–6), pp. 602–610, 2005.

    CrossRef  Google Scholar 

  25. A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. NIPS Conference, pp. 545–552, 2009.

    Google Scholar 

  26. A. Graves and N. Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. ICML Conference, pp. 1764–1772, 2014.

    Google Scholar 

  27. K. Greff, R. K. Srivastava, J. Koutnik, B. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2016.http://ieeexplore.ieee.org/abstract/document/7508408/

  28. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.

    CrossRef  Google Scholar 

  29. S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.

    Google Scholar 

  30. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.

    Google Scholar 

  31. M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daume III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP, 2014.

    Google Scholar 

  32. H. Jaeger. The “echo state” approach to analysing and training recurrent neural networks – with an erratum note. German National Research Center for Information Technology GMD Technical Report, 148(34), 13, 2001.

    Google Scholar 

  33. H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304, pp. 78–80, 2004.

    CrossRef  Google Scholar 

  34. J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.

    Google Scholar 

  35. R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. ICML Confererence, pp. 2342–2350, 2015.

    Google Scholar 

  36. N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. EMNLP, 3, 39, pp. 413, 2013.

    Google Scholar 

  37. A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv:1506.02078, 2015.https://arxiv.org/abs/1506.02078

  38. A. Karpathy. The unreasonable effectiveness of recurrent neural networks, Blog post, 2015.http://karpathy.github.io/2015/05/21/rnn-effectiveness/

  39. Y. Koren. Collaborative filtering with temporal dynamics. ACM KDD Conference, pp. 447–455, 2009.

    Google Scholar 

  40. Q. Le, N. Jaitly, and G. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.https://arxiv.org/abs/1504.00941

  41. Q. Liao, K. Kawaguchi, and T. Poggio. Streaming normalization: Towards simpler and more biologically-plausible normalizations for online and recurrent learning. arXiv:1610.06160, 2016. https://arxiv.org/abs/1610.06160

  42. Z. Lipton, J. Berkowitz, and C. Elkan. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019, 2015.https://arxiv.org/abs/1506.00019

  43. M. Lukosevicius and H. Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), pp. 127–149, 2009.

    CrossRef  Google Scholar 

  44. W. Maass, T. Natschlager, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), pp. 2351–2560, 2002.

    CrossRef  Google Scholar 

  45. J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010.

    Google Scholar 

  46. J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011.

    Google Scholar 

  47. T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012.

    Google Scholar 

  48. T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv:1412.7753, 2014.https://arxiv.org/abs/1412.7753

  49. T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. Interspeech, Vol 2, 2010.

    Google Scholar 

  50. R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013.

    Google Scholar 

  51. R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012.

    Google Scholar 

  52. F. Pineda. Generalization of back-propagation to recurrent neural networks. Physical Review Letters, 59(19), 2229, 1987.

    Google Scholar 

  53. L. Polanyi and A. Zaenen. Contextual valence shifters. Computing Attitude and Affect in Text: Theory and Applications, pp. 1–10, Springer, 2006.

    Google Scholar 

  54. G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics, 47(2), pp. 228–235, 2002.

    CrossRef  Google Scholar 

  55. J. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1), pp. 77–105, 1990.

    CrossRef  Google Scholar 

  56. A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS Conference, pp. 1177–1184, 2008.

    Google Scholar 

  57. M. Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), pp. 2673–2681, 1997.

    CrossRef  Google Scholar 

  58. I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI, pp. 3295–3301, 2017.

    Google Scholar 

  59. I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.

    Google Scholar 

  60. H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.

    MathSciNet  CrossRef  Google Scholar 

  61. R. Socher, C. Lin, C. Manning, and A. Ng. Parsing natural scenes and natural language with recursive neural networks. ICML Confererence, pp. 129–136, 2011.

    Google Scholar 

  62. R. Socher, J. Pennington, E. Huang, A. Ng, and C. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. Empirical Methods in Natural Language Processing (EMNLP), pp. 151–161, 2011.

    Google Scholar 

  63. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. Empirical Methods in Natural Language Processing (EMNLP), p. 1642, 2013.

    Google Scholar 

  64. Y. Song, A. Elkahky, and X. He. Multi-rate deep learning for temporal recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 909–912, 2016.

    Google Scholar 

  65. M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language modeling. Interspeech, 2010.

    Google Scholar 

  66. M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney. Translation modeling with bidirectional recurrent neural networks. EMNLP, pp. 14–25, 2014.

    Google Scholar 

  67. I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013.

    Google Scholar 

  68. I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. NIPS Conference, pp. 3104–3112, 2014.

    Google Scholar 

  69. V. Veeriah, N. Zhuang, and G. Qi. Differential recurrent neural networks for action recognition. IEEE International Conference on Computer Vision, pp. 4041–4049, 2015.

    Google Scholar 

  70. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.

    Google Scholar 

  71. S. Wang, C. Aggarwal, and H. Liu. Randomized feature engineering as a fast and accurate alternative to kernel methods. ACM KDD Conference, 2017.

    Google Scholar 

  72. P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.

    CrossRef  Google Scholar 

  73. J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merrienboer, A. Joulin, and T. Mikolov. Towards ai-complete question answering: A set of pre-requisite toy tasks. arXiv:1502.05698, 2015. https://arxiv.org/abs/1502.05698

  74. J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.

    Google Scholar 

  75. C. Wu, A. Ahmed, A. Beutel, A. Smola, and H. Jing. Recurrent recommender networks. ACM International Conference on Web Search and Data Mining, pp. 495–503, 2017.

    Google Scholar 

  76. W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.

    Google Scholar 

  77. L. Zhang, C. Aggarwal, and G.-J. Qi. Stock Price Prediction via Discovering Multi-Frequency Trading Patterns. ACM KDD Conference, 2017.

    Google Scholar 

  78. S. Zhang, L. Yao, and A. Sun. Deep learning based recommender system: A survey and new perspectives. arXiv:1707.07435, 2017.https://arxiv.org/abs/1707.07435

  79. M. Zhou, Z. Ding, J. Tang, and D. Yin. Micro Behaviors: A new perspective in e-commerce recommender systems. WSDM Conference, 2018.

    Google Scholar 

  80. http://caffe.berkeleyvision.org/

  81. http://torch.ch/

  82. http://deeplearning.net/software/theano/

  83. https://www.tensorflow.org/

  84. http://deeplearning.net/tutorial/lstm.html

  85. https://arxiv.org/abs/1609.08144

  86. https://github.com/karpathy/char-rnn

  87. https://deeplearning4j.org/lstm

  88. http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2018). Recurrent Neural Networks. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94463-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94462-3

  • Online ISBN: 978-3-319-94463-0

  • eBook Packages: Computer ScienceComputer Science (R0)