Abstract
Long Short-Term Memory (LSTM,[1]) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow indefinitely and eventually cause the network to break down. Our remedy is an adaptive “forget gate” that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review an illustrative benchmark problem on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve a continual version of that problem. LSTM with forget gates, however, easily solves it in an elegant way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Hochreiter and J. Schmidhuber, “Long short-term memory”, Neural Computation, 9(8):1735–1780, 1997.
Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 5(2): 157–166, 1994.
F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM”, Tech. Rep. IDSIA-01-99, IDSIA, Lugano, CH, 1999.
A. W. Smith and D. Zipser, “Learning sequential structures with the real-time recurrent learning algorithm”, International Journal of Neural Systems, 1(2): 125–131, 1989.
A. Cleeremans, D. Servan-Schreiber, and J. L. McClelland, “Finite-state automata and simple recurrent networks”, Neural Computation, 1:372–381, 1989.
S. E. Fahlman, “The recurrent cascade-correlation learning algorithm”, in NIPS 3, R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Eds. 1991, pp. 190–196, San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Gers, F.A., Schmidhuber, J., Cummins, F. (1999). Continual Prediction using LSTM with Forget Gates. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN Vietri-99. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0877-1_10
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0877-1_10
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1226-6
Online ISBN: 978-1-4471-0877-1
eBook Packages: Springer Book Archive