Attentional Recurrent Neural Networks for Sentence Classification

  • Ankit KumarEmail author
  • Reshma Rastogi (nee Khemchandani)
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 757)


This paper proposes a novel attention mechanism-based recurrent neural networks for sentence classification. The proposed models are based on attentional implementation of Vanilla Recurrent Neural Network and Bidirectional Recurrent Neural Network architecture over two kinds of recurrent cells: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). These have been termed as Attentional LSTM (ALSTM), Attentional GRU (AGRU), Attentional Bi-LSTM (ABLSTM), and Attentional Bi-GRU (ABGRU). In order to improve context construction by the network, pretrained word embeddings are used as input for the network. To check the efficacy of proposed framework, we made a comparison of our models with other state-of-the-art methods on six benchmark datasets. The proposed attentional models achieve state-of-the-art result on three datasets and attain performance gain over baseline models on four of the six datasets.


Sentence classification Recurrent neural network LSTM GRU Attention mechanism 


  1. 1.
    Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inform. 235, 246–50 (2017)Google Scholar
  2. 2.
    Jotikabukkana, P., Sornlertlamvanich, V., Manabu, O., Haruechaiyasak, C.: Effectiveness of social media text classification by utilizing the online news category. In: 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pp. 1–5. IEEE, New York (2015)Google Scholar
  3. 3.
    Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans. Inf. Syst. (TOIS) 27(2), 12 (2009)CrossRefGoogle Scholar
  4. 4.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  5. 5.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. IEEE, New York (2015)Google Scholar
  6. 6.
    Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE, New York (2013)Google Scholar
  7. 7.
    Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)
  8. 8.
    Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 6(02), 107–116 (1998)CrossRefGoogle Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)Google Scholar
  11. 11.
    Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, New York (2013)Google Scholar
  12. 12.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  13. 13.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  14. 14.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  15. 15.
    Yu, Z., Ramanarayanan, V., Suendermann-Oeft, D., Wang, X., Zechner, K., Chen, L., Qian, Y.: Using bidirectional LSTM recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 338–345. IEEE, New York (2015)Google Scholar
  16. 16.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  17. 17.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  18. 18.
    Lopez, M.M., Kalita, J.: Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017)
  19. 19.
    Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, pp. 333, 2267–2273 (2015)Google Scholar
  20. 20.
    Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
  21. 21.
    Kowsari, K., Brown, D.E., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., Barnes, L.E.: Hdltex: Hierarchical deep learning for text classification. arXiv preprint arXiv:1709.08267 (2017)
  22. 22.
    Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
  23. 23.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM, New York (2004)Google Scholar
  24. 24.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39(2–3), 165–210 (2005)CrossRefGoogle Scholar
  25. 25.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)Google Scholar
  26. 26.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)Google Scholar
  27. 27.
    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)Google Scholar
  28. 28.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Ankit Kumar
    • 1
    Email author
  • Reshma Rastogi (nee Khemchandani)
    • 1
  1. 1.Department of Computer ScienceSouth Asian UniversityNew DelhiIndia

Personalised recommendations