Using Distant Supervision and Paragraph Vector for Large Scale Relation Extraction

Liu, Yuming; Xu, Weiran

doi:10.1007/978-981-10-0457-5_17

Yuming Liu¹⁷ &
Weiran Xu¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 590))

Included in the following conference series:

National Conference on Big Data Technology and Applications

1275 Accesses

Abstract

Distant supervision has the ability to generate a huge amount training data. Recently, the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction. But multi-instance multi-label learning only uses hidden variables when inference relation between entities, which could not make full use of training data. Besides, traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence, which limits the system’s performance. This paper presents a novel approach for multi-instance multi-label learning, which takes the idea of fuzzy classification. We use cluster center as train-data and in this way we can adequately utilize sentence-level features. Meanwhile, we extend feature set by paragraph vector, which carries semantic information of sentences. We conduct an extensive empirical study to verify our contributions. The result shows our method is superior to the state-of-the-art distant supervised baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bunescu, R., Mooney, R.: Learning to extract relations from the web using minimal supervision. In: Annual Meeting-Association for Computational Linguistics, vol. 45(1), p. 576 (2007)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1003–1011. Association for Computational Linguistics (2009)
Google Scholar
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465. Association for Computational Linguistics (2012)
Google Scholar
Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB 1999, pp. 77–86 (1999)
Google Scholar
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 41–50. ACM (2007)
Google Scholar
Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1013–1023. Association for Computational Linguistics (2010)
Google Scholar
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 541–550. Association for Computational Linguistics (2011)
Google Scholar
Angeli, G., Tibshirani, J., Wu, J.Y., Manning, C.D.: Combining distant and partial supervision for relation extraction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar
Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. STUDFUZZ, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Chapter Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Yessenalina, A., Cardie, C.: Compositional matrix-space models for sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 172–182. Association for Computational Linguistics (2011)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
Google Scholar
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING, pp. 2335–2344 (2014)
Google Scholar
Mikolov, T.: Statistical Language Models Based on Neural Networks. Presentation at Google, Mountain View (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, 100876, China
Yuming Liu & Weiran Xu

Authors

Yuming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weiran Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuming Liu .

Editor information

Editors and Affiliations

Tsinghua University, Dept. of Computer Science and Technology, Beijing, China
Wenguang Chen
Harbin Engineering University, China
Guisheng Yin
South China Normal University, Guangzhou, China
Gansen Zhao
Harbin Engineering University, China
Qilong Han
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin Univ. of Science and Technology, Harbin, China
Guanglu Sun
Harbin Sea of Clouds & Computer Tech., Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Xu, W. (2016). Using Distant Supervision and Paragraph Vector for Large Scale Relation Extraction. In: Chen, W., et al. Big Data Technology and Applications. BDTA 2015. Communications in Computer and Information Science, vol 590. Springer, Singapore. https://doi.org/10.1007/978-981-10-0457-5_17

Download citation

DOI: https://doi.org/10.1007/978-981-10-0457-5_17
Published: 02 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0456-8
Online ISBN: 978-981-10-0457-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics