Skip to main content

Overview of Long Short-Term Memory Neural Networks

  • Chapter
  • First Online:
Deep Learning Classifiers with Memristive Networks

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 14))

Abstract

Long Short-term Memory was designed to avoid vanishing and exploding gradient problems in recurrent neural networks. Over the last twenty years, various modifications of an original LSTM cell were proposed. This chapter gives an overview of basic LSTM cell structures and demonstrates forward and backward propagation within the most widely used configuration called traditional LSTM cell. Besides, LSTM neural network configurations are described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Article  Google Scholar 

  2. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019

  3. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  4. Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: Continual prediction with LSTM

    Google Scholar 

  5. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

    Article  MathSciNet  Google Scholar 

  6. Gomez, A. (2016). Backpropogating an LSTM: A Numerical Example. Aidan Gomez blog at Medium

    Google Scholar 

  7. Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation now casting. In: Advances in neural information processing systems, pp 802–810

    Google Scholar 

  8. Neil D, Pfeiffer M, Liu SC (2016) Phased lstm: accelerating recurrent network training for long or event-based sequences. In: Advances in Neural Information Processing Systems, pp 3882–3890

    Google Scholar 

  9. Karpathy A (2015) The unreasonable effectiveness of recurrent neural networks. Andrej Karpathy blog

    Google Scholar 

  10. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  11. Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, pp 273–278

    Google Scholar 

  12. Yoon J, Zame WR, van der Schaar M (2017) Multi-directional recurrent neural networks: a novel method for estimating missing data

    Google Scholar 

  13. Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Pappachen James .

Editor information

Editors and Affiliations

Chapter Highlights

Chapter Highlights

 

  • Long short-term memory (LSTM) is a special type of recurrent neural network (RNN).

  • LSTM unit has a memory and multiple weighted gates. Therefore it does not suffer from vanishing or exploding gradient problems of RNN and can process sequences of arbitrary length.

  • An original LSTM unit has no forget gate (NFG).

  • Traditional LSTM configuration:

    $$ \begin{pmatrix} g_{t} \\ i_{t} \\ f_{t} \\ o_{t} \\ \end{pmatrix} = \begin{pmatrix} \text {tanh}\\ \sigma \\ \sigma \\ \sigma \\ \end{pmatrix} \cdot \begin{pmatrix} W^{(g)} &{}U^{(g)} \\ W^{(i)} &{}U^{(i)} \\ W^{(f)} &{}U^{(f)} \\ W^{(o)} &{}U^{(o)} \\ \end{pmatrix} \cdot \begin{pmatrix} x_{t} \\ h_{t-1} \end{pmatrix};$$
    $$C_{t}= f_{t} \bigodot C_{t-1}+i_{t}\bigodot g_t{}; \qquad h_{t}=o_{t} \bigodot \text {tanh}(C_{t}).$$
  • Traditional LSTM with peephole connections is distinguished for precise timing and often referred as ‘Vanilla’ LSTM.

  • ConvLSTM are effective in spatiotemporal sequential problems.

  • Updates in Phased LSTM occur at irregularly sampled time points \(t_{j}\) which can be controlled.

  • Depending on input and output sequnces’ length, following LSTM models are differentiated: ‘One-to-One’, ‘One-to-Many’, ‘Many-to-One’, ‘Many-to-Many’.

  • LSTM architecture can have different directionality, dimensionality and combination of them.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Smagulova, K., James, A.P. (2020). Overview of Long Short-Term Memory Neural Networks. In: James, A. (eds) Deep Learning Classifiers with Memristive Networks. Modeling and Optimization in Science and Technologies, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-14524-8_11

Download citation

Publish with us

Policies and ethics