Abstract
Long-term dependencies are difficult to learn using Recurrent Neural Networks due to the vanishing and exploding gradient problems, since their hidden transform operation is applied linearly in sequence length. We introduce a new layer type (the Tree Memory Unit), whose weight application scales logarithmically in the sequence length. We evaluate this on two pathologically hard memory benchmarks and two datasets. On those three tasks which require long-term dependencies, it strongly outperforms Long Short-Term Memory baselines. However, it does show weaker performance on sequences with few long-term dependencies. We believe that our approach can lead to more efficient sequence learning if used on sequences with long-term dependencies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We note that this setup, which has become a standard for RNN evaluation, differs from the original implementation by Schmidhuber [11].
References
Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1120–1128 (2016). http://arxiv.org/abs/1511.06464
Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473 (2014). https://doi.org/10.1146/annurev.neuro.26.041002.131047. ISSN 0147–006X
Bengio, Y., Simard, P., Frasconi, P.: Learning Long-Term Dependencies with Gradient Descent is Difficult (1994). ISSN 19410093
Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D., Potts, C.: A Fast Unified Model for Parsing and Sentence Understanding. arXiv preprint arXiv:1603.06021 (2016). https://doi.org/10.18653/v1/P16-1139
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014). https://doi.org/10.3115/v1/D14-1179, http://arxiv.org/abs/1406.1078. ISBN 9781937284961
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. In: International Conference on Machine Learning, pp. 2067–2075 (2015). https://doi.org/10.1145/2661829.2661935, http://arxiv.org/abs/1502.02367. ISBN 9781634393973
Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent Batch Normalization. arXiv preprint arXiv:1603.09025 (2016). https://doi.org/10.1227/01.NEU.0000210260.55124.A4. ISSN 16113349
Graves, A.: Generating Sequences with Recurrent Neural Networks. arXiv preprint arXiv:1308.0850, August 2013. https://doi.org/10.1145/2661829.2661935. ISSN 18792782
Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv preprint arXiv:1410.5401 (2014). https://doi.org/10.3389/neuro.12.006.2007. ISSN 2041–1723
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. (2016). https://doi.org/10.1109/TNNLS.2016.2582924. ISSN 21622388
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735, http://www7.informatik.tu-muenchen.de/~hochreit%5Cnwww.idsia.ch/~juergen. ISSN 0899–7667
Hutter, M.: The human knowledge compression contest, p. 6 (2012). http://prize.hutter1.net
Jing, L., et al.: Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs. arXiv preprint arXiv:1612.05231 (2016). ISSN 1938–7228
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)
Le, Q.V., Jaitly, N., Hinton, G.E.: A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. arXiv preprint arXiv:1504.00941 (2015). https://doi.org/10.1109/72.279181. ISSN 1045–9227
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791. ISSN 00189219
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013). https://doi.org/10.1109/72.279181, http://arxiv.org/abs/1211.5063. ISBN 08997667 (ISSN)
Pollack, J.B.: Recursive distributed representations. Artif. Intell. 46(1), 77–105 (1990). https://doi.org/10.1016/0004-3702(90)90005-K. ISSN 00043702
Socher, R., Perelygin, A., Wu, J.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013). https://doi.org/10.1371/journal.pone.0073791, http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf%5Cnwww.aclweb.org/anthology/D13-1170%5Cnaclweb.org/supplementals/D/D13/D13-1170.Attachment.pdf%5Cnoldsite.aclweb.org/anthology-new/D/D13/D13-1170.pdf. ISBN 9781937284978
Tai, K.S., Socher, R., Manning, C.D.: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. arXiv preprint arXiv:1503.00075 (2015). https://doi.org/10.1515/popets-2015-0023. ISSN 9781941643723
van den Oord, A., et al.: WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 (2016). https://doi.org/10.1109/ICASSP.2009.4960364. ISSN 0899–7667
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122 (2015). https://doi.org/10.16373/j.cnki.ahr.150049. ISSN 00237205
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Diehl, F., Knoll, A. (2019). Tree Memory Networks for Sequence Processing. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)