Skip to main content

Improving Few Occurrence Feature Performance in Distant Supervision for Relation Extraction

  • Conference paper
Book cover Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

  • 3134 Accesses

Abstract

Distant supervision is a hotspot in relation extraction research. Instead of relying on annotated text, distant supervision hires a knowledge base as supervision. For each pair of entities that appears in some knowledge base’s relation, this approach find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. The automatic labeling provides a large amount of data, but the data have serious problem. Most features appear only few times in training data, and such insufficient data make these features very susceptible to noise, which will lead to a flawed classifier. In this paper, we propose a method to improve few occurrence features’ performance in distant supervision relation extraction. We present a novel model to calculating the similarity between a feature and an entity pair, and then adjust the entity pair’ features by their similarity. The experiment shows our method boosted the performance of relation extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009), pp. 1003–1011. Association for Computational Linguistics (2009)

    Google Scholar 

  2. Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM 2007), pp. 41–50. ACM Press, New York (2007)

    Google Scholar 

  3. Bellare, K., Mccallum, A.: Learning extractors from unlabeled text using relevant databases. In: Proceedings of the Sixth International Workshop on Information Integration on the Web (IIWeb 2007), in Conjunction with AAAI 2007, pp. 10–16. AAAI Press, Vancouver (2007)

    Google Scholar 

  4. Hoffmann, R., Zhang, C., Weld, D.S.: Learning 5000 relational extractors. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 286–295. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  5. Riedel, S., Yao, L., McCallum, A.: Modeling Relations and Their Mentions without Labeled Text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (2010)

    Google Scholar 

  7. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 541–550. Association for Computational Linguistics (2011)

    Google Scholar 

  8. Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 721–729. Association for Computational Linguistics (2012)

    Google Scholar 

  9. Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2004)

    Google Scholar 

  10. Freebase data dumps, http://download.freebase.com/datadumps/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Zhao, Y. (2013). Improving Few Occurrence Feature Performance in Distant Supervision for Relation Extraction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics