Continual Prediction using LSTM with Forget Gates

Gers, Felix A.; Schmidhuber, Jürgen; Cummins, Fred

doi:10.1007/978-1-4471-0877-1_10

Felix A. Gers⁴,
Jürgen Schmidhuber⁴ &
Fred Cummins⁴

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

202 Accesses
11 Citations

Abstract

Long Short-Term Memory (LSTM,[1]) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow indefinitely and eventually cause the network to break down. Our remedy is an adaptive “forget gate” that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review an illustrative benchmark problem on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve a continual version of that problem. LSTM with forget gates, however, easily solves it in an elegant way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Overview of Long Short-Term Memory Neural Networks

Learning Multiple Timescales in Recurrent Neural Networks

Learning Sparse Hidden States in Long Short-Term Memory

References

S. Hochreiter and J. Schmidhuber, “Long short-term memory”, Neural Computation, 9(8):1735–1780, 1997.
Article Google Scholar
Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 5(2): 157–166, 1994.
Article Google Scholar
F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM”, Tech. Rep. IDSIA-01-99, IDSIA, Lugano, CH, 1999.
Google Scholar
A. W. Smith and D. Zipser, “Learning sequential structures with the real-time recurrent learning algorithm”, International Journal of Neural Systems, 1(2): 125–131, 1989.
Article Google Scholar
A. Cleeremans, D. Servan-Schreiber, and J. L. McClelland, “Finite-state automata and simple recurrent networks”, Neural Computation, 1:372–381, 1989.
Article Google Scholar
S. E. Fahlman, “The recurrent cascade-correlation learning algorithm”, in NIPS 3, R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Eds. 1991, pp. 190–196, San Mateo, CA: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Felix A. Gers, Jürgen Schmidhuber & Fred Cummins

Authors

Felix A. Gers
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar
Fred Cummins
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Scienze Fisiche, “E.R. Caianiello”, Universita di Salerno, 84081, Baronissi (SA), Italy
Maria Marinaro
DMI, Universita di Salerno, 84081, Baronissi (SA), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gers, F.A., Schmidhuber, J., Cummins, F. (1999). Continual Prediction using LSTM with Forget Gates. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN Vietri-99. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0877-1_10

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0877-1_10
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1226-6
Online ISBN: 978-1-4471-0877-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Continual Prediction using LSTM with Forget Gates

Abstract

Access this chapter

Preview

Similar content being viewed by others

Overview of Long Short-Term Memory Neural Networks

Learning Multiple Timescales in Recurrent Neural Networks

Learning Sparse Hidden States in Long Short-Term Memory

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Continual Prediction using LSTM with Forget Gates

Abstract

Access this chapter

Preview

Similar content being viewed by others

Overview of Long Short-Term Memory Neural Networks

Learning Multiple Timescales in Recurrent Neural Networks

Learning Sparse Hidden States in Long Short-Term Memory

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation