Tree Memory Networks for Sequence Processing

Diehl, Frederik; Knoll, Alois

doi:10.1007/978-3-030-30487-4_34

Tree Memory Networks for Sequence Processing

Frederik Diehl¹² &
Alois Knoll¹³

Conference paper
First Online: 09 September 2019

2879 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11727))

Abstract

Long-term dependencies are difficult to learn using Recurrent Neural Networks due to the vanishing and exploding gradient problems, since their hidden transform operation is applied linearly in sequence length. We introduce a new layer type (the Tree Memory Unit), whose weight application scales logarithmically in the sequence length. We evaluate this on two pathologically hard memory benchmarks and two datasets. On those three tasks which require long-term dependencies, it strongly outperforms Long Short-Term Memory baselines. However, it does show weaker performance on sequences with few long-term dependencies. We believe that our approach can lead to more efficient sequence learning if used on sequences with long-term dependencies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We note that this setup, which has become a standard for RNN evaluation, differs from the original implementation by Schmidhuber [11].

References

Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1120–1128 (2016). http://arxiv.org/abs/1511.06464
Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473 (2014). https://doi.org/10.1146/annurev.neuro.26.041002.131047. ISSN 0147–006X
Article Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning Long-Term Dependencies with Gradient Descent is Difficult (1994). ISSN 19410093
Article Google Scholar
Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D., Potts, C.: A Fast Unified Model for Parsing and Sentence Understanding. arXiv preprint arXiv:1603.06021 (2016). https://doi.org/10.18653/v1/P16-1139
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014). https://doi.org/10.3115/v1/D14-1179, http://arxiv.org/abs/1406.1078. ISBN 9781937284961
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. In: International Conference on Machine Learning, pp. 2067–2075 (2015). https://doi.org/10.1145/2661829.2661935, http://arxiv.org/abs/1502.02367. ISBN 9781634393973
Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent Batch Normalization. arXiv preprint arXiv:1603.09025 (2016). https://doi.org/10.1227/01.NEU.0000210260.55124.A4. ISSN 16113349
Article Google Scholar
Graves, A.: Generating Sequences with Recurrent Neural Networks. arXiv preprint arXiv:1308.0850, August 2013. https://doi.org/10.1145/2661829.2661935. ISSN 18792782
Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv preprint arXiv:1410.5401 (2014). https://doi.org/10.3389/neuro.12.006.2007. ISSN 2041–1723
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. (2016). https://doi.org/10.1109/TNNLS.2016.2582924. ISSN 21622388
Article MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735, http://www7.informatik.tu-muenchen.de/~hochreit%5Cnwww.idsia.ch/~juergen. ISSN 0899–7667
Article Google Scholar
Hutter, M.: The human knowledge compression contest, p. 6 (2012). http://prize.hutter1.net
Jing, L., et al.: Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs. arXiv preprint arXiv:1612.05231 (2016). ISSN 1938–7228
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)
Article Google Scholar
Le, Q.V., Jaitly, N., Hinton, G.E.: A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. arXiv preprint arXiv:1504.00941 (2015). https://doi.org/10.1109/72.279181. ISSN 1045–9227
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791. ISSN 00189219
Article Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013). https://doi.org/10.1109/72.279181, http://arxiv.org/abs/1211.5063. ISBN 08997667 (ISSN)
Article Google Scholar
Pollack, J.B.: Recursive distributed representations. Artif. Intell. 46(1), 77–105 (1990). https://doi.org/10.1016/0004-3702(90)90005-K. ISSN 00043702
Article Google Scholar
Socher, R., Perelygin, A., Wu, J.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013). https://doi.org/10.1371/journal.pone.0073791, http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf%5Cnwww.aclweb.org/anthology/D13-1170%5Cnaclweb.org/supplementals/D/D13/D13-1170.Attachment.pdf%5Cnoldsite.aclweb.org/anthology-new/D/D13/D13-1170.pdf. ISBN 9781937284978
Article Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. arXiv preprint arXiv:1503.00075 (2015). https://doi.org/10.1515/popets-2015-0023. ISSN 9781941643723
Article Google Scholar
van den Oord, A., et al.: WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 (2016). https://doi.org/10.1109/ICASSP.2009.4960364. ISSN 0899–7667
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122 (2015). https://doi.org/10.16373/j.cnki.ahr.150049. ISSN 00237205

Download references

Author information

Authors and Affiliations

fortiss GmbH, Munich, Germany
Frederik Diehl
Chair for Robotics and Embedded Systems, Technische Universität München, Munich, Germany
Alois Knoll

Authors

Frederik Diehl
View author publications
You can also search for this author in PubMed Google Scholar
Alois Knoll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederik Diehl .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Diehl, F., Knoll, A. (2019). Tree Memory Networks for Sequence Processing. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-30487-4_34
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics