Recurrent neural network with attention mechanism for language model


The rapid growth of the Internet promotes the growth of textual data, and people get the information they need from the amount of textual data to solve problems. The textual data may include some potential information like the opinions of the crowd, the opinions of the product, or some market-relevant information. However, some problems that point to “How to get features from the text” must be solved. The model of extracting the text features by using the neural network method is called neural network language model. The features are based on n-gram Model concept, which are the co-occurrence relationship between the vocabularies. The word vectors are important because the sentence vectors or the document vectors still have to understand the relationship between the words, and based on this, this study discusses the word vectors. This study assumes that the words contain “the meaning in sentences” and “the position of grammar.” This study uses recurrent neural network with attention mechanism to establish a language model. This study uses Penn Treebank, WikiText-2, and NLPCC2017 text datasets. According to these datasets, the proposed models provide the better performance by the perplexity.

This is a preview of subscription content, log in to check access.

Fig. 1



Long short-term memory


Neural network language model


Pointer sentinel mixture model


Recurrent highway network


Recurrent neural network


Recurrent neural network language model


Variational recurrent highway network


Variational recurrent neural network


  1. 1.

    Khodabakhsh M, Kahani M, Bagheri E (2018) Predicting future personal life events on twitter via recurrent neural networks. J Intell Inf Syst.

    Article  Google Scholar 

  2. 2.

    Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway network. arXiv preprint arXiv:1607.03474

  3. 3.

    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR 2013), Scottsdale, Arizona, USA

  4. 4.

    Li Y, Li W, Sun F, Li S (2015) Component-enhanced Chinese character embeddings. In: The conference on empirical methods in natural language processing (EMNLP 2015), Lisbon, Portugal, pp 829–834

  5. 5.

    Niu Y, Xie R, Liu Z, Sun M (2017) Improved Word Representation Learning with Sememes. In: Proceedings of the 55th annual meeting of the association for computational linguistics (ACL 2017), vol 1, pp 2049–2058

  6. 6.

    Han H, Bai X, Li P (2018) Augmented sentiment representation by learning context information. Neural Comput Appl.

    Article  Google Scholar 

  7. 7.

    Wu X, Du Z, Guo Y, Fujita H (2019) Hierarchical attention based long short-term memory for Chinese lyric generation. Appl Intell 49(1):44–52

    Article  Google Scholar 

  8. 8.

    Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  9. 9.

    Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. InL INTERSPEECH 2010, Makuhari, Chiba, Japan, pp 1045–1048

  10. 10.

    Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural Network. In: Advances in neural information processing systems (NIPS 2016), Barcelona, Spain, pp 1019–1027

  11. 11.

    Merity S, Xiong C, Bradbury J, Socher R (2017) Pointer sentinel mixture models. In: Proceedings of the international conference on learning representations (ICLR 2017), Toulon, France

  12. 12.

    Press O, Wolf L (2016) Using the output embedding to improve language models. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol 2, pp 157–163, Valencia, Spain

  13. 13.

    Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers: a loss framework for language modeling. In: Proceedings of the international conference on learning representations (ICLR 2017), Toulon, France

  14. 14.

    Ali ES, Elazim SMA (2018) Mine blast algorithm for environmental economic load dispatch with valve loading effect. Neural Comput Appl 30:261–270

    Article  Google Scholar 

  15. 15.

    Abd-Elazim SM, Ali ES (2018) Load frequency controller design of a two-area system composing of PV grid and thermal generator via firefly algorithm. Neural Comput Appl 30(2):607–616

    Article  Google Scholar 

  16. 16.

    Oshaba AS, Ali ES, Elazim SMA (2017) PI controller design using ABC algorithm for MPPT of PV system supplying DC motor-pump load. Neural Comput Appl 28(2):353–364

    Article  Google Scholar 

  17. 17.

    Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164

  18. 18.

    Zaremba W (2015). Accessed 1 June 2018

  19. 19.

    Pytorch (2016). Accessed 1 June 2018

  20. 20.

    Zhou H, Huang M, Zhang T, Zhu X, Liu B (2017) Emotional chatting machine: emotional conversation generation with internal and external memory. In: The 32nd AAAI conference on artificial intelligence (AAAI-18), New Orleans, Louisiana, USA, pp 730–738

  21. 21.

    Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Computational linguistics 19:313–330

    Google Scholar 

  22. 22.

    Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, Czech Republic, 177-180

  23. 23.

    Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLT), Miami, USA, pp 234–239

  24. 24.

    Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329

  25. 25.

    Grave E, Joulin A, Usunier N (2017) Improving neural language models with a continuous cache. In: Proceedings of the international conference on learning representations (ICLR 2017), Toulon, France

Download references

Author information



Corresponding author

Correspondence to Arun Kumar Sangaiah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, M., Chiang, H., Sangaiah, A.K. et al. Recurrent neural network with attention mechanism for language model. Neural Comput & Applic 32, 7915–7923 (2020).

Download citation


  • Language model
  • Recurrent neural network
  • Artificial intelligence
  • Attention mechanism