Skip to main content

Tree Memory Networks for Sequence Processing

  • Conference paper
  • First Online:
  • 2879 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11727))

Abstract

Long-term dependencies are difficult to learn using Recurrent Neural Networks due to the vanishing and exploding gradient problems, since their hidden transform operation is applied linearly in sequence length. We introduce a new layer type (the Tree Memory Unit), whose weight application scales logarithmically in the sequence length. We evaluate this on two pathologically hard memory benchmarks and two datasets. On those three tasks which require long-term dependencies, it strongly outperforms Long Short-Term Memory baselines. However, it does show weaker performance on sequences with few long-term dependencies. We believe that our approach can lead to more efficient sequence learning if used on sequences with long-term dependencies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We note that this setup, which has become a standard for RNN evaluation, differs from the original implementation by Schmidhuber [11].

References

  1. Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1120–1128 (2016). http://arxiv.org/abs/1511.06464

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473 (2014). https://doi.org/10.1146/annurev.neuro.26.041002.131047. ISSN 0147–006X

    Article  Google Scholar 

  3. Bengio, Y., Simard, P., Frasconi, P.: Learning Long-Term Dependencies with Gradient Descent is Difficult (1994). ISSN 19410093

    Article  Google Scholar 

  4. Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D., Potts, C.: A Fast Unified Model for Parsing and Sentence Understanding. arXiv preprint arXiv:1603.06021 (2016). https://doi.org/10.18653/v1/P16-1139

  5. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014). https://doi.org/10.3115/v1/D14-1179, http://arxiv.org/abs/1406.1078. ISBN 9781937284961

  6. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. In: International Conference on Machine Learning, pp. 2067–2075 (2015). https://doi.org/10.1145/2661829.2661935, http://arxiv.org/abs/1502.02367. ISBN 9781634393973

  7. Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent Batch Normalization. arXiv preprint arXiv:1603.09025 (2016). https://doi.org/10.1227/01.NEU.0000210260.55124.A4. ISSN 16113349

    Article  Google Scholar 

  8. Graves, A.: Generating Sequences with Recurrent Neural Networks. arXiv preprint arXiv:1308.0850, August 2013. https://doi.org/10.1145/2661829.2661935. ISSN 18792782

  9. Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv preprint arXiv:1410.5401 (2014). https://doi.org/10.3389/neuro.12.006.2007. ISSN 2041–1723

  10. Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. (2016). https://doi.org/10.1109/TNNLS.2016.2582924. ISSN 21622388

    Article  MathSciNet  Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735, http://www7.informatik.tu-muenchen.de/~hochreit%5Cnwww.idsia.ch/~juergen. ISSN 0899–7667

    Article  Google Scholar 

  12. Hutter, M.: The human knowledge compression contest, p. 6 (2012). http://prize.hutter1.net

  13. Jing, L., et al.: Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs. arXiv preprint arXiv:1612.05231 (2016). ISSN 1938–7228

  14. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  15. Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)

    Article  Google Scholar 

  16. Le, Q.V., Jaitly, N., Hinton, G.E.: A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. arXiv preprint arXiv:1504.00941 (2015). https://doi.org/10.1109/72.279181. ISSN 1045–9227

    Article  Google Scholar 

  17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791. ISSN 00189219

    Article  Google Scholar 

  18. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013). https://doi.org/10.1109/72.279181, http://arxiv.org/abs/1211.5063. ISBN 08997667 (ISSN)

    Article  Google Scholar 

  19. Pollack, J.B.: Recursive distributed representations. Artif. Intell. 46(1), 77–105 (1990). https://doi.org/10.1016/0004-3702(90)90005-K. ISSN 00043702

    Article  Google Scholar 

  20. Socher, R., Perelygin, A., Wu, J.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013). https://doi.org/10.1371/journal.pone.0073791, http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf%5Cnwww.aclweb.org/anthology/D13-1170%5Cnaclweb.org/supplementals/D/D13/D13-1170.Attachment.pdf%5Cnoldsite.aclweb.org/anthology-new/D/D13/D13-1170.pdf. ISBN 9781937284978

    Article  Google Scholar 

  21. Tai, K.S., Socher, R., Manning, C.D.: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. arXiv preprint arXiv:1503.00075 (2015). https://doi.org/10.1515/popets-2015-0023. ISSN 9781941643723

    Article  Google Scholar 

  22. van den Oord, A., et al.: WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 (2016). https://doi.org/10.1109/ICASSP.2009.4960364. ISSN 0899–7667

  23. Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122 (2015). https://doi.org/10.16373/j.cnki.ahr.150049. ISSN 00237205

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederik Diehl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diehl, F., Knoll, A. (2019). Tree Memory Networks for Sequence Processing. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30487-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30486-7

  • Online ISBN: 978-3-030-30487-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics