Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning
In this paper a novel recurrent neural network (RNN) model for gradient-based sequence learning is introduced. The presented dynamic cortex memory (DCM) is an extension of the well-known long short term memory (LSTM) model. The main innovation of the DCM is the enhancement of the inner interplay of the gates and the error carousel due to several new and trainable connections. These connections enable a direct signal transfer from the gates to one another. With this novel enhancement the networks are able to converge faster during training with back-propagation through time (BPTT) than LSTM under the same training conditions. Furthermore, DCMs yield better generalization results than LSTMs. This behaviour is shown for different supervised problem scenarios, including storing precise values, adding and learning a context-sensitive grammar.
KeywordsDynamic Cortex Memory (DCM) Recurrent Neural Networks (RNN) Neural Networks Long Short Term Memory (LSTM)
Unable to display preview. Download preview PDF.
- 3.Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. thesis, Technische Universitaet Muenchen (2008)Google Scholar
- 4.Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)Google Scholar
- 7.Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems 9, 473–479 (1997)Google Scholar
- 8.Otte, S., Krechel, D., Liwicki, M.: JANNLab neural network framework for java. In: MLDM 2013, pp. 39–46. Ibai-Publishing, New York (2013)Google Scholar
- 9.Otte, S., Otte, C., Schlaefer, A., Wittig, L., Hüttmann, G., Drömann, D., Zell, A.: A-Scan based lung tumor tissue classification with bidirectional long short term memory networks. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing, MLSP (2013)Google Scholar
- 10.Ul-Hasan, A., Breuel, T.M.: Can we build language-independent OCR using LSTM networks? In: Proceedings of the 4th International Workshop on Multilingual OCR, p. 9. ACM (2013)Google Scholar