Fast Training of Deep LSTM Networks

  • Wen YuEmail author
  • Xiaoou Li
  • Jesus Gonzalez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11554)


Deep recurrent neural networks (RNN), such as LSTM, have many advantages over forward networks. However, the LSTM training method, such as backward propagation through time (BPTT), is really slow.

In this paper, by separating the LSTM cell into forward and recurrent substructures, we propose a much simpler and faster training method than the BPTT. The deep LSTM is modified by combining the deep RNN with the multilayer perceptron (MLP). The simulation results show that our fast training method for LSTM is better than BPTT for LSTM.


  1. 1.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NIPS 2006), pp. 153–160 (2007)Google Scholar
  2. 2.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs.NE] (2014)
  3. 3.
    Graves, A., Mohamed, A., Hinton, G.: Speech Recognition with Deep Recurrent Neural Networks. arXiv:1303.5778 (2013)
  4. 4.
    Hirose, N., Tajima, R.: Modeling of rolling friction by recurrent neural network using LSTM. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, pp. 6471–6475 (2017)Google Scholar
  5. 5.
    Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1–6 (2006)Google Scholar
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)Google Scholar
  7. 7.
    Box, G., Jenkins, G., Reinsel, G.: Time Series Analysis: Forecasting and Control, 4th edn. Wiley, Hoboken (2008)Google Scholar
  8. 8.
    Kumar, A., Chandel, Y.: Solar radiation prediction using artificial neural network techniques: a review. Renew. Sustain. Energy Rev. 33(2), 772–781 (2014)Google Scholar
  9. 9.
    Kingma, P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG] (2014)
  10. 10.
    Narendra, K., Parthasarathy, K.: Gradient methods for optimization of dynamical systems containing neural networks. IEEE Trans. Neural Netw. 2(2), 252–262 (1991)Google Scholar
  11. 11.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 12–19 (2012)Google Scholar
  12. 12.
    LeCun, Y., Bottou, L., Bengio, Y., Haffne, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)Google Scholar
  13. 13.
    Cao, W., Wang, X., Ming, Z., Gao, J.: A review on neural networks with random weights. Neurocomputing 275(2), 278–287 (2018)Google Scholar
  14. 14.
    Wang, X., Musa, A.: Advances in neural network based learning. Int. J. Mach. Learn. Cybern. 5(1), 1–2 (2014)Google Scholar
  15. 15.
    Wang, X., Cao, W.: Non-iterative approaches in training feed-forward neural networks and their applications. Soft Comput. 22(11), 3473–3476 (2018)Google Scholar
  16. 16.
    Ljung, L.: System Identification-Theory for User. Prentice Hall, Englewood Cliffs (1987)Google Scholar
  17. 17.
    Nelles, O.: Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models. Springer, Heidelberg (2013)Google Scholar
  18. 18.
    Ogunmolu, O., Gu, X., Jiang, S., Gans, N.: Nonlinear Systems Identification Using Deep Dynamic Neural Networks. arXiv:1610.01439v1 [cs.NE] (2016)
  19. 19.
    Schoukens, J., Schoukens, J., Ljung, L.: Wiener-Hammerstein benchmark. In: 15th IFAC Symposium on System Identification, pp. 1–6 (2009)Google Scholar
  20. 20.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)Google Scholar
  21. 21.
    Sugeno, M., Yasukawa, T.: A fuzzy logic based approach to qualitative modeling. IEEE Trans. Fuzzy Syst. 5(1), 7–31 (1993)Google Scholar
  22. 22.
    Wang, Y.: A new concept using LSTM neural networks for dynamic system identification. In: 2017 American Control Conference, Seattle, USA, pp. 5324–5329 (2017)Google Scholar
  23. 23.
    Wang, L., Langari, R.: Complex systems modeling via fuzzy logic. IEEE Trans. Syst. Man Cybern. 26(1), 100–106 (1996)Google Scholar
  24. 24.
    Yu, W.: Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithms. Inf. Sci. 58(1), 131–147 (2004)Google Scholar
  25. 25.
    Yu, W., Li, X.: Discrete-time neuro identification without robust modification. IEE Proc. Control Theory Appl. 150(3), 311–316 (2003)Google Scholar
  26. 26.
    Yu, W., Rubio, J.: Recurrent neural networks training with stable bounding ellipsoid algorithm. IEEE Trans. Neural Netw. 20(6), 983–991 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Departamento de Control AutomáticoCINVESTAV-IPN (National Polytechnic Institute)Mexico CityMexico
  2. 2.Departamento de ComputaciónCINVESTAV-IPN (National Polytechnic Institute)Mexico CityMexico

Personalised recommendations