An End-to-End Entity and Relation Extraction Network with Multi-head Attention

  • Lishuang LiEmail author
  • Yuankai Guo
  • Shuang Qian
  • Anqiao Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11221)


Relation extraction is an important semantic processing task in natural language processing. The state-of-the-art systems usually rely on elaborately designed features, which are usually time-consuming and may lead to poor generalization. Besides, most existing systems adopt pipeline methods, which treat the task as two separated tasks, i.e., named entity recognition and relation extraction. However, the pipeline methods suffer two problems: (1) Pipeline model over-simplifies the task to two independent parts. (2) The errors will be accumulated from named entity recognition to relation extraction. Therefore, we present a novel joint model for entities and relations extraction based on multi-head attention, which avoids the problems in the pipeline methods and reduces the dependence on features engineering. The experimental results show that our model achieves good performance without extra features. Our model reaches an F-score of 85.7% on SemEval-2010 relation extraction task 8, which has competitive performance without extra feature compared with previous joint models. On publication, codes will be made publicly available.


Relation extraction End-to-End joint extraction Named entity recognition 


  1. 1.
    Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING, pp. 2335–2344 (2014)Google Scholar
  2. 2.
    Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., Jin, Z.: Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1785–1794 (2015)Google Scholar
  3. 3.
    Zhou, P., Shi, W., Tian, J., et al.: Attention-based Bidirectional Long Short-Term Memory Networks for relation classification. In: Meeting of the Association for Computational Linguistics, pp. 207–212 (2016)Google Scholar
  4. 4.
    Li, Q., Ji, H.: Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 402–412 (2014)Google Scholar
  5. 5.
    Miwa, M., Bansal, M.: End-to-End relation extraction using LSTMs on sequences and tree structures. In: Meeting of the Association for Computational Linguistics, pp. 1105–1116 (2016)Google Scholar
  6. 6.
    Miwa, M., Sasaki, Y.: Modeling joint entity and relation extraction with table representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1858–1869 (2014)Google Scholar
  7. 7.
    Katiyar, A., Cardie, C.: Going out on a limb: joint extraction of entity mentions and relations without dependency trees. In: Meeting of the Association for Computational Linguistics, pp. 917–928 (2017)Google Scholar
  8. 8.
    Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. arXiv preprint arXiv:1706.05075 (2017)
  9. 9.
    Hendrickx, I., et al.: Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, pp. 94–99 (2009)Google Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, vol. 3, pp. 282–289 (2001)Google Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  13. 13.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  14. 14.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Zhang, D., Wang, D.: Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006 (2015)
  16. 16.
    Santos, C.N.D., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. arXiv preprint arXiv:1504.06580 (2015)
  17. 17.
    Xu, K., Feng, Y., Huang, S., Zhao, D.: Semantic relation classification via convolutional neural networks with simple negative sampling. arXiv preprint arXiv:1506.07650 (2015)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lishuang Li
    • 1
    Email author
  • Yuankai Guo
    • 1
  • Shuang Qian
    • 1
  • Anqiao Zhou
    • 1
  1. 1.Dalian University of TechnologyDalianChina

Personalised recommendations