Long Short-Term Memory Learns Context Free and Context Sensitive Languages

  • Felix A. Gers
  • Jürgen Schmidhuber


Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM’s superior performance on context free language (CFL) benchmarks, and show that it works even better than previous hardwired or highly specialized architectures. To the best of our knowledge, LSTM variants are also the first RNNs to learn a context sensitive language (CSL), namely, a n b n c n .


Recurrent Neural Network Training Sequence Output Unit Regular Language Memory Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    F. A. Gers and J. Schmidhuber. Recurrent nets that time and count. In Proc. IJCNN’2000, Int. Joint Conf. on Neural Networks, Como, Italy, 2000.Google Scholar
  2. [2]
    F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451–2471, 2000.CrossRefGoogle Scholar
  3. [3]
    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.CrossRefGoogle Scholar
  4. [4]
    B. A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Networks, 6(5):1212–1228, 1995.CrossRefGoogle Scholar
  5. [5]
    P. Rodriguez, J. Wiles, and J Elman. A recurrent neural network that learns to count. Connection Science, 11(1):5–40, 1999.CrossRefGoogle Scholar
  6. [6]
    Paul Rodriguez and Janet Wiles. Recurrent neural networks can learn to implement symbol-sensitive counting. In Advances in Neural Information Processing Systems, volume 10, pages 87–93. The MIT Press, 1998.Google Scholar
  7. [7]
    G. Z. Sun, C. Lee Giles, H. H. Chen, and Y. C. Lee. The neural network pushdown automaton: Model, stack and learning simulations. Technical Report CS-TR-3118, University of Maryland, College Park August 1993.Google Scholar
  8. [8]
    B. Tonkes and J. Wiles. Learning a context-free task with a recurrent neural network: An analysis of stability. In Proceedings of the Fourth Biennial Conference of the Australasian Cognitive Science Society, 1997.Google Scholar
  9. [9]
    J. Wiles and J. Elman. Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks. In Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, pages 482–487, Cambridge, MA, 1995. MIT Press.Google Scholar

Copyright information

© Springer-Verlag Wien 2001

Authors and Affiliations

  • Felix A. Gers
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIAMannoSwitzerland

Personalised recommendations