Abstract
Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks which use backpropagation method for calculation of derivatives. We construct an analytical framework to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold the norm of the gradients in the suitable range. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range 100 and longer.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Horne, B.G., Siegelmann, H.T.: Computational capabilities of recurrent narx neural networks. IEEE Trans. Syst. Man Cybern. B 27(2), 208–215 (1997)
Cardot, H., Bone, R.: Advanced Methods for Time Series Prediction Using Recurrent Neural Networks, pp. 15–36. Intech, Croatia (2011)
Prokhorov, D.V.: Toyota prius hev neurocontrol and diagnostics. Neural Netw. 21, 458–465 (2008)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S. Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)
Bahdanau, D., Cho, K., van Merrienboer, B.: On the properties of neural machine translation: encoder decoder approaches. In: SSST-8, Doha, Qatar (2014)
Frasconi, P., Bengio, Y., Simard, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Sutskever, I., Martens, J. Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the ICML (2011)
Kussul, E.M., Rachkovskij, D.A.: Multilevel assembly neural architecture and processing of sequences. Neurocomput. Atten. Connectionism Neurocomput. 2, 577–590 (1991)
Bengio, Y., Pascanu, R.: On the difficulty of training recurrent neural networks. Technical report, Universite de Montreal (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Chernodub, A., Nowicki, D. (2016). Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham. https://doi.org/10.1007/978-3-319-46672-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-46672-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46671-2
Online ISBN: 978-3-319-46672-9
eBook Packages: Computer ScienceComputer Science (R0)