Recurrent Neural Networks



“Democracy is the recurrent suspicion that more than half the people are right more than half the time.”—The New Yorker, July 3, 1944.


  1. [3]
    C. Aggarwal. Data mining: The textbook. Springer, 2015.Google Scholar
  2. [6]
    C. Aggarwal. Machine learning for text. Springer, 2018.Google Scholar
  3. [14]
    J. Ba, J. Kiros, and G. Hinton. Layer normalization. arXiv:1607.06450, 2016.
  4. [20]
    P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11), pp. 937–946, 1999.CrossRefGoogle Scholar
  5. [69]
    K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP, 2014.
  6. [71]
    J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.
  7. [80]
    J. Connor, R. Martin, and L. Atlas. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), pp. 240–254, 1994.CrossRefGoogle Scholar
  8. [81]
    T. Cooijmans, N. Ballas, C. Laurent, C. Gulcehre, and A. Courville. Recurrent batch normalization. arXiv:1603.09025, 2016.
  9. [84]
    T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.Google Scholar
  10. [101]
    G. Dorffner. Neural networks for time series processing. Neural Network World, 1996.Google Scholar
  11. [105]
    K. Doya. Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1, pp. 75–80, 1993.Google Scholar
  12. [111]
    J. Elman. Finding structure in time. Cognitive Science, 14(2), pp. 179–211, 1990.CrossRefGoogle Scholar
  13. [115]
    A. Fader, L. Zettlemoyer, and O. Etzioni. Paraphrase-Driven Learning for Open Question Answering. ACL, pp. 1608–1618, 2013.Google Scholar
  14. [118]
    A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. ACM KDD Conference, 2014.Google Scholar
  15. [121]
    P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks, 9(5), pp. 768–786, 1998.CrossRefGoogle Scholar
  16. [140]
    X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.Google Scholar
  17. [143]
    Y. Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research (JAIR), 57, pp. 345–420, 2016.MathSciNetCrossRefGoogle Scholar
  18. [144]
    C. Goller and A. Küchler. Learning task-dependent distributed representations by backpropagation through structure. Neural Networks, 1, pp. 347–352, 1996.Google Scholar
  19. [150]
    A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.Google Scholar
  20. [151]
    A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.
  21. [152]
    A. Graves. Supervised sequence labelling with recurrent neural networks Springer, 2012.
  22. [153]
    A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML Confererence, pp. 369–376, 2006.Google Scholar
  23. [154]
    A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE TPAMI, 31(5), pp. 855–868, 2009.CrossRefGoogle Scholar
  24. [155]
    A. Graves and J. Schmidhuber. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18(5–6), pp. 602–610, 2005.CrossRefGoogle Scholar
  25. [156]
    A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. NIPS Conference, pp. 545–552, 2009.Google Scholar
  26. [157]
    A. Graves and N. Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. ICML Conference, pp. 1764–1772, 2014.Google Scholar
  27. [160]
    K. Greff, R. K. Srivastava, J. Koutnik, B. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2016.
  28. [204]
    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.CrossRefGoogle Scholar
  29. [205]
    S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.Google Scholar
  30. [214]
    S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.Google Scholar
  31. [216]
    M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daume III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP, 2014.Google Scholar
  32. [219]
    H. Jaeger. The “echo state” approach to analysing and training recurrent neural networks – with an erratum note. German National Research Center for Information Technology GMD Technical Report, 148(34), 13, 2001.Google Scholar
  33. [220]
    H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304, pp. 78–80, 2004.CrossRefGoogle Scholar
  34. [225]
    J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.Google Scholar
  35. [228]
    R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. ICML Confererence, pp. 2342–2350, 2015.Google Scholar
  36. [231]
    N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. EMNLP, 3, 39, pp. 413, 2013.Google Scholar
  37. [233]
    A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv:1506.02078, 2015.
  38. [235]
    A. Karpathy. The unreasonable effectiveness of recurrent neural networks, Blog post, 2015.
  39. [258]
    Y. Koren. Collaborative filtering with temporal dynamics. ACM KDD Conference, pp. 447–455, 2009.Google Scholar
  40. [271]
    Q. Le, N. Jaitly, and G. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.
  41. [294]
    Q. Liao, K. Kawaguchi, and T. Poggio. Streaming normalization: Towards simpler and more biologically-plausible normalizations for online and recurrent learning. arXiv:1610.06160, 2016.
  42. [298]
    Z. Lipton, J. Berkowitz, and C. Elkan. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019, 2015.
  43. [301]
    M. Lukosevicius and H. Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), pp. 127–149, 2009.CrossRefGoogle Scholar
  44. [304]
    W. Maass, T. Natschlager, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), pp. 2351–2560, 2002.CrossRefGoogle Scholar
  45. [313]
    J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010.Google Scholar
  46. [314]
    J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011.Google Scholar
  47. [324]
    T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012.Google Scholar
  48. [326]
    T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv:1412.7753, 2014.
  49. [328]
    T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. Interspeech, Vol 2, 2010.Google Scholar
  50. [368]
    R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013.Google Scholar
  51. [369]
    R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012.Google Scholar
  52. [375]
    F. Pineda. Generalization of back-propagation to recurrent neural networks. Physical Review Letters, 59(19), 2229, 1987.Google Scholar
  53. [377]
    L. Polanyi and A. Zaenen. Contextual valence shifters. Computing Attitude and Affect in Text: Theory and Applications, pp. 1–10, Springer, 2006.Google Scholar
  54. [378]
    G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics, 47(2), pp. 228–235, 2002.CrossRefGoogle Scholar
  55. [379]
    J. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1), pp. 77–105, 1990.CrossRefGoogle Scholar
  56. [385]
    A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS Conference, pp. 1177–1184, 2008.Google Scholar
  57. [434]
    M. Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), pp. 2673–2681, 1997.CrossRefGoogle Scholar
  58. [439]
    I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI, pp. 3295–3301, 2017.Google Scholar
  59. [440]
    I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.Google Scholar
  60. [444]
    H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.MathSciNetCrossRefGoogle Scholar
  61. [459]
    R. Socher, C. Lin, C. Manning, and A. Ng. Parsing natural scenes and natural language with recursive neural networks. ICML Confererence, pp. 129–136, 2011.Google Scholar
  62. [460]
    R. Socher, J. Pennington, E. Huang, A. Ng, and C. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. Empirical Methods in Natural Language Processing (EMNLP), pp. 151–161, 2011.Google Scholar
  63. [461]
    R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. Empirical Methods in Natural Language Processing (EMNLP), p. 1642, 2013.Google Scholar
  64. [465]
    Y. Song, A. Elkahky, and X. He. Multi-rate deep learning for temporal recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 909–912, 2016.Google Scholar
  65. [476]
    M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language modeling. Interspeech, 2010.Google Scholar
  66. [477]
    M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney. Translation modeling with bidirectional recurrent neural networks. EMNLP, pp. 14–25, 2014.Google Scholar
  67. [478]
    I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013.Google Scholar
  68. [480]
    I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. NIPS Conference, pp. 3104–3112, 2014.Google Scholar
  69. [504]
    V. Veeriah, N. Zhuang, and G. Qi. Differential recurrent neural networks for action recognition. IEEE International Conference on Computer Vision, pp. 4041–4049, 2015.Google Scholar
  70. [509]
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.Google Scholar
  71. [516]
    S. Wang, C. Aggarwal, and H. Liu. Randomized feature engineering as a fast and accurate alternative to kernel methods. ACM KDD Conference, 2017.Google Scholar
  72. [526]
    P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.CrossRefGoogle Scholar
  73. [527]
    J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merrienboer, A. Joulin, and T. Mikolov. Towards ai-complete question answering: A set of pre-requisite toy tasks. arXiv:1502.05698, 2015.
  74. [528]
    J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.Google Scholar
  75. [534]
    C. Wu, A. Ahmed, A. Beutel, A. Smola, and H. Jing. Recurrent recommender networks. ACM International Conference on Web Search and Data Mining, pp. 495–503, 2017.Google Scholar
  76. [552]
    W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.Google Scholar
  77. [559]
    L. Zhang, C. Aggarwal, and G.-J. Qi. Stock Price Prediction via Discovering Multi-Frequency Trading Patterns. ACM KDD Conference, 2017.Google Scholar
  78. [560]
    S. Zhang, L. Yao, and A. Sun. Deep learning based recommender system: A survey and new perspectives. arXiv:1707.07435, 2017.
  79. [565]
    M. Zhou, Z. Ding, J. Tang, and D. Yin. Micro Behaviors: A new perspective in e-commerce recommender systems. WSDM Conference, 2018.Google Scholar
  80. [571]
  81. [572]
  82. [573]
  83. [574]
  84. [578]
  85. [579]
  86. [580]
  87. [617]
  88. [618]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterInternational Business MachinesYorktown HeightsUSA

Personalised recommendations