Skip to main content

Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9948))

Abstract

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks which use backpropagation method for calculation of derivatives. We construct an analytical framework to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold the norm of the gradients in the suitable range. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range 100 and longer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Horne, B.G., Siegelmann, H.T.: Computational capabilities of recurrent narx neural networks. IEEE Trans. Syst. Man Cybern. B 27(2), 208–215 (1997)

    Google Scholar 

  2. Cardot, H., Bone, R.: Advanced Methods for Time Series Prediction Using Recurrent Neural Networks, pp. 15–36. Intech, Croatia (2011)

    Google Scholar 

  3. Prokhorov, D.V.: Toyota prius hev neurocontrol and diagnostics. Neural Netw. 21, 458–465 (2008)

    Article  Google Scholar 

  4. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S. Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)

    Google Scholar 

  5. Bahdanau, D., Cho, K., van Merrienboer, B.: On the properties of neural machine translation: encoder decoder approaches. In: SSST-8, Doha, Qatar (2014)

    Google Scholar 

  6. Frasconi, P., Bengio, Y., Simard, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  7. Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Sutskever, I., Martens, J. Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the ICML (2011)

    Google Scholar 

  9. Kussul, E.M., Rachkovskij, D.A.: Multilevel assembly neural architecture and processing of sequences. Neurocomput. Atten. Connectionism Neurocomput. 2, 577–590 (1991)

    Google Scholar 

  10. Bengio, Y., Pascanu, R.: On the difficulty of training recurrent neural networks. Technical report, Universite de Montreal (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Artem Chernodub .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Chernodub, A., Nowicki, D. (2016). Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham. https://doi.org/10.1007/978-3-319-46672-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46672-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46671-2

  • Online ISBN: 978-3-319-46672-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics