Bi-directional attention comparison for semantic sentence matching

  • Huiyuan Lai
  • Yizheng TaoEmail author
  • Chunliu Wang
  • Lunfan Xu
  • Dingyong Tang
  • Gongliang Li


Semantic sentence matching, also known as calculation of text similarity, is one of the most important problems in natural language processing. Existing deep models mostly focus on the neural networks with attention mechanism. In this paper, we present a deep architecture to match two Chinese sentences, which only relies on alignment instead of long short-term memory network after attention mechanism is employed to get interaction information between sentence-pairs, the model becomes more lightweight and simple. Meanwhile, in order to capture semantic features enough, in addition to using max pooling and average pooling operation, we also employ a pooling operation named attention-pooling to aggregate information from the whole sentence, the final matching score is obtained after a multilayer perceptron classifier. Experiments are carried out on ATEC-NLP dataset and outline the effectiveness of our approach.


Semantic matching Alignment Attention mechanism Chinese 



We thank Ant Financial for allowing us to use the dataset from Ant Financial Artificial Competition for experiments. An earlier version of this paper was presented at the International Conference on International Symposium on Artificial Intelligence and Robotics.


  1. 1.
    Adit Deshpande. Diving into Natural Language Processing.
  2. 2.
  3. 3.
    Aliguliyev RM (2009) A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst Appl 36(4):7764–7772CrossRefGoogle Scholar
  4. 4.
    Ant Financial. Ant Financial Artificial Competition.
  5. 5.
    Berger A, Caruana R, Cohn D, Freitag D, Mittal V (2000) Bridging the Lexical Chasm: Statistical Approaches to Answer-finding. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 192–199Google Scholar
  6. 6.
    Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 632–642Google Scholar
  7. 7.
    Chen Q, Zhu X (2017) Enhanced LSTM for Natural Language Inference. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp1657–1668Google Scholar
  8. 8.
    Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259Google Scholar
  9. 9.
    Choi J, Yoo KM, Lee S (2017) Learning to compose task-specific tree structures. arXiv preprint arXiv:1707.02786v4Google Scholar
  10. 10.
    Csernai K (2017) Quora question pair datasetGoogle Scholar
  11. 11.
    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural net-works. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp 315–323Google Scholar
  12. 12.
    Huang P, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using click through data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp 2333–2338Google Scholar
  13. 13.
    Huimin Lu, Bin Li, Junwu Zhu, Yujie Li, Yun Li, et al. (2016) Wound intensity correction and segmentation with convolutional neural networks. Concurrency and Computation: Practice and Experience 29(6)Google Scholar
  14. 14.
  15. 15.
    Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980Google Scholar
  16. 16.
    Lu Z, Li H (2013) A Deep Architecture for Matching Short Texts. In: Advances in Neural Information Processing Systems, pp 1367–1375Google Scholar
  17. 17.
    Lu H, Li Y, Shenglin M, Dong W, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 99:1–1Google Scholar
  18. 18.
    Lu H, Li Y, Chen M, Kim H, Serikawa S (2017) Brain Intelligence: Go Beyond Artificial Intelligence. Mobile Networks and Application, pp.1–8Google Scholar
  19. 19.
    Lu H, Li Y, Uemura T (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Futur Gener Comput Syst 10:1016Google Scholar
  20. 20.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781Google Scholar
  21. 21.
    Mou L, Men R, Ge L, Yan X, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp 130–136Google Scholar
  22. 22.
    Natural Language Computing Group, Microsoft Research Asia (2017) R-NET: Machine Reading Comprehension with Self-matching Networks. Accessed 2017
  23. 23.
    Nie Y, Bansal M (2017) Shortcut-stacked sentence encoders for multi-domain inference. arXiv preprint arXiv:1708.02312Google Scholar
  24. 24.
    Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward RK (2015) Deep sentence embedding using the long short term memory network: analysis and application to information retrieval[J]. IEEE Trans Audio Speech & Language Processing 24(4):694–707CrossRefGoogle Scholar
  25. 25.
    Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A Decomposable Attention Model for Natural Language Inference. arXiv preprint arXiv:1606.01933Google Scholar
  26. 26.
    Seo MJ, Kembhavi A, Farhadi A, and Hajishirzi H (2016) Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603Google Scholar
  27. 27.
    Serikawa S, Lu H (2014) Underwater Image Dehazing Using Joint Trilateral Filter. Comput Electr Eng 40(1):41–50CrossRefGoogle Scholar
  28. 28.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salak-hutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  29. 29.
    Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv preprint arXiv:1505.00387Google Scholar
  30. 30.
    Williams A, Nangia N, Bowman SR (2017) A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426Google Scholar
  31. 31.
    Xing X, Shen F, Yang Y, et al. (2017) Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Transactions on Image Processing (TIP)Google Scholar
  32. 32.
    Xu X, He L, Shimada A et al (2016) Learning unified binary codes for cross-modal retrieval via latent semantic hashing[J]. Neurocomputing 213:191–203CrossRefGoogle Scholar
  33. 33.
    Xu X, He L, Lu H, Gao L, Ji Y (2018) Deep adversarial metric learning for cross-modal retrieval. World Wide Web Journal, 1–16Google Scholar
  34. 34.
    Zhang S, Zhang X, Wang H, Cheng J, Li P, Ding Z (2017) Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs. Applied Sciences 7(8)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Huiyuan Lai
    • 1
  • Yizheng Tao
    • 1
    Email author
  • Chunliu Wang
    • 1
  • Lunfan Xu
    • 1
  • Dingyong Tang
    • 1
  • Gongliang Li
    • 1
  1. 1.Institute of Computer ApplicationChina Academy of Engineering PhysicsMianyangChina

Personalised recommendations