Double Attention Mechanism for Sentence Embedding

  • Miguel Kakanakou
  • Hongwei XieEmail author
  • Yan Qiang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11242)


This paper proposes a new model for sentence embedding, a very important topic in natural language processing, using a double attention mechanism to combine of a recurrent neural network (RNN) and a convolutional neural network (CNN). First, the proposed model uses a bidirectional Long Short Term Memory Recurrent Neural Network (RNN-LSTM) with a self-attention mechanism to compute a first representation of the sentence called primitive representation. Then the primitive representation of the sentence is used along with a convolutional neural network with a pooling based attention mechanism to compute a set of attention weights used during the pooling step. The final sentence representation is obtained after concatenation of the output of the CNN neural network with the primitive sentence representation. The double attention mechanism helps the proposed model to retain more information contained in the sentence and then to be able to generate a more representative feature vector for the sentence. The model can be trained end-to-end with limited hyper-parameters. We evaluate our model on three different benchmarks dataset for the sentence classification task and compare that with the state-of-art method. Experimental results show that the proposed model yields a significant performance gain compared to other sentence embedding methods in all the three dataset.


Bidirectional LSTM Convolutional neural network Sentence embedding Pooling-based attention mechanism Self-attention mechanism 


  1. Er, M.J., Zhang, Y., Wang, N., Pratama, M.: Attention pooling-based convolutional neural network for sentence modelling. J. Inf. Sci. 373, 388–403 (2016)CrossRefGoogle Scholar
  2. Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations Conference, pp. 34–49 (2017)Google Scholar
  3. Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language mode. Mach. Learn. Res. 3, 932–938 (2003)zbMATHGoogle Scholar
  4. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. Accessed 17 Sept 2017
  5. Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D., Potts, C.: A fast unified model for parsing and sentence understanding. Accessed 10 Sept 2017
  6. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. Accessed 21 Oct 2017
  7. Chung, J., Gulcehre, C., Cho, K.H., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. (2014). Accessed 12 Jan 2018
  8. dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)Google Scholar
  9. dos Santos, C., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. Accessed 23 Oct 2017
  10. Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 813–820 (2015)Google Scholar
  11. Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabeled data. (2016). Accessed 12 Jan 2018
  12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  13. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). Accessed 25 Oct 2017
  14. Kim, Y.: Convolutional neural networks for sentence classification. Accessed 10 Dec 2017
  15. Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)Google Scholar
  16. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of ICML Conference, vol. 14, pp. 1188–1196 (2014)Google Scholar
  17. Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. Accessed 26 Jan 2018
  18. Li, P., et al.: Dataset and neural recurrent sequence labeling model for open-domain factoid question answering. Accessed 10 Mar 2018
  19. Ling, W., Lin, C.-C., Tsvetkov, Y., Amir, S.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of Natural Language Processing Conference, pp. 1367–1372 (2015)Google Scholar
  20. Liu, Y., Sun, C., Lin, L., Wang, X.: Learning natural language inference using bidirectional LSTM model and inner-attention (2016). Accessed 10 Mar 2018
  21. Ma, M., Huang, L., Xiang, B., Zhou, B.: Dependency-based convolutional neural networks for sentence embedding. Accessed 25 Mar 2018
  22. Margarit, H., Subramaniam, R.: A batch-normalized recurrent network for sentiment classification. In: Proceedings of Neural Information Processing Systems (2016)Google Scholar
  23. Memisevic, R.: Learning to relate images. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1829–1846 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyTaiyuan University of TechnologyTaiyuanChina

Personalised recommendations