Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks

Chernodub, Artem; Nowicki, Dimitri

doi:10.1007/978-3-319-46672-9_11

Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks

Artem Chernodub¹⁹ &
Dimitri Nowicki¹⁹

Conference paper
First Online: 30 September 2016

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9948))

Abstract

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks which use backpropagation method for calculation of derivatives. We construct an analytical framework to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold the norm of the gradients in the suitable range. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range 100 and longer.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Horne, B.G., Siegelmann, H.T.: Computational capabilities of recurrent narx neural networks. IEEE Trans. Syst. Man Cybern. B 27(2), 208–215 (1997)
Google Scholar
Cardot, H., Bone, R.: Advanced Methods for Time Series Prediction Using Recurrent Neural Networks, pp. 15–36. Intech, Croatia (2011)
Google Scholar
Prokhorov, D.V.: Toyota prius hev neurocontrol and diagnostics. Neural Netw. 21, 458–465 (2008)
Article Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S. Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)
Google Scholar
Bahdanau, D., Cho, K., van Merrienboer, B.: On the properties of neural machine translation: encoder decoder approaches. In: SSST-8, Doha, Qatar (2014)
Google Scholar
Frasconi, P., Bengio, Y., Simard, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Sutskever, I., Martens, J. Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the ICML (2011)
Google Scholar
Kussul, E.M., Rachkovskij, D.A.: Multilevel assembly neural architecture and processing of sequences. Neurocomput. Atten. Connectionism Neurocomput. 2, 577–590 (1991)
Google Scholar
Bengio, Y., Pascanu, R.: On the difficulty of training recurrent neural networks. Technical report, Universite de Montreal (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of MMS of NASU, Center for Cybernetics, 42 Glushkova Ave., Kiev, 03187, Ukraine
Artem Chernodub & Dimitri Nowicki

Authors

Artem Chernodub
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Nowicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artem Chernodub .

Editor information

Editors and Affiliations

The University of Tokyo, Tokyo, Japan
Akira Hirose
Kobe University, Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology, Ikoma, Japan
Kazushi Ikeda
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences, Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chernodub, A., Nowicki, D. (2016). Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham. https://doi.org/10.1007/978-3-319-46672-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-46672-9_11
Published: 30 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46671-2
Online ISBN: 978-3-319-46672-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics