Bi-directional Gated Memory Networks for Answer Selection

  • Wei Wu
  • Houfeng WangEmail author
  • Sujian Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)


Answer selection is a crucial subtask of the open domain question answering problem. In this paper, we introduce the Bi-directional Gated Memory Network (BGMN) to model the interactions between question and answer. We match question \((\varvec{P})\) and answer (Q) in two directions. In each direction(for example \({\varvec{P}}\rightarrow {\varvec{Q}}\)), sentence representation of P triggers an iterative attention process that aggregates informative evidence of Q. In each iteration, sentence representation of P and aggregated evidence of Q so far are passed through a gate determining the importance of the two when attend to every step of Q. Finally based on the aggregated evidence, the decision is made through a fully connected network. Experimental results on SemEval-2015 Task 3 dataset demonstrate that our proposed method substantially outperforms several strong baselines. Further experiments show that our model is general and can be applied to other sentence-pair modeling tasks.


Question Answering Attention mechanism Memory networks 



Our work is supported by National Natural Science Foundation of China (No. 61370117, No. 61433015 & No. 61572049).


  1. 1.
    Nakov, P., Marquez, L., Magdy, W., Moschitti, A., Glass, J., Randeree, B.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval-2015, p. 269 (2015)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473 (2014)Google Scholar
  3. 3.
    Beltagy, I., Roller, S., Cheng, P., Erk, K., Mooney, R.J.: Representing meaning with a combination of logical form and vectors (2015). arXiv preprint: arXiv:1505.06816
  4. 4.
    Bird, S.: Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)Google Scholar
  5. 5.
    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)Google Scholar
  6. 6.
    Cho, K., van Merrienboer, B., Gülehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP (2014)Google Scholar
  7. 7.
    Eigen D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)Google Scholar
  8. 8.
    Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: ASRU (2015)Google Scholar
  9. 9.
    Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)Google Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
  12. 12.
    Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R., Socher, R.: Ask me anything: dynamic memory networks for natural language processing. In: ICML (2016)Google Scholar
  13. 13.
    Lai, A., Hockenmaier, J.: Illinois-lh: a denotational and distributional approach to semantics. In: Proceedings of SemEval, 2:5, pp. 329–334 (2014)Google Scholar
  14. 14.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)Google Scholar
  15. 15.
    Rocktäschel, T., Grefenstette, E., Hermann, K.M., Kocisky, T., Blunsom, P.: Reasoning about entailment with neural attention. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  16. 16.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP (2015)Google Scholar
  17. 17.
    Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension (2016). arXiv preprint: arXiv:1611.01603
  18. 18.
    Severyn, A., Moschitti, A.: Automatic feature engineering for answer selection and extraction. In: EMNLP, vol. 13, pp. 458–467 (2013)Google Scholar
  19. 19.
    Sha, L., Li, S., Chang, B., Sui, Z.: Recognizing textual entailment via multi-task knowledge assisted LSTM. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD -2016. LNCS, vol. 10035, pp. 285–298. Springer, Cham (2016). doi: 10.1007/978-3-319-47674-2_24 CrossRefGoogle Scholar
  20. 20.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)Google Scholar
  22. 22.
    Tan, M., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. CoRR, abs/1511.04108 (2015)Google Scholar
  23. 23.
    Tran, Q.H., Tran, V., Vu, T., Nguyen, M., Pham, S.B.: Jaist: combining multiple features for answer selection in community question answering. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval, vol. 15, pp. 215–219 (2015)Google Scholar
  24. 24.
    Wang, M., Smith, N.A., Mitamura, T.: What is the jeopardy model? A quasi-synchronous grammar for QA. In: EMNLP-CoNLL, vol. 7, pp. 22–32 (2007)Google Scholar
  25. 25.
    Weston, J., Chopra, S., Bordes, A.: Memory networks (2014). arXiv preprint: arXiv:1410.3916
  26. 26.
    Wu, Y., Wu, W., Li, Z., Zhou, M.: Knowledge enhanced hybrid neural network for text matching (2016). arXiv preprint: arXiv:1611.04684
  27. 27.
    Xiong, C., Zhong, V., Socher, R.: Dynamic coattention networks for question answering (2016). arXiv preprint: arXiv:1611.01604
  28. 28.
    Yih, W.-T., Chang, M.-W., Meek, C., Pastusiak, A., Yih, S.W.-T., Meek, C.: Question answering using enhanced lexical semantic models (2013)Google Scholar
  29. 29.
    Zhang, X., Li, S., Sha, L., Wang, H.: Attentive interactive neural networks for answer selection in community question answering. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)Google Scholar
  30. 30.
    Zhao, J., Zhu, T.T., Lan, M.: Ecnu: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the SemEval, pp. 271–277 (2014)Google Scholar
  31. 31.
    Zhou, X., Hu, B., Chen, Q., Tang, B., Wang, X.: Answer sequence learning with neural networks for answer selection in community question answering. In: ACL (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Key Laboratory of Computational Linguistics, Ministry of Education, School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina

Personalised recommendations