Skip to main content

Using Distant Supervision for Extracting Relations on a Large Scale

  • Conference paper
  • 1165 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Abstract

Most of Information Extraction (IE) systems are designed for extracting a restricted number of relations in a specific domain. Recent work about Web-scale knowledge extraction has changed this perspective by introducing large-scale IE systems. Such systems are open-domain and characterized by a large number of relations, which makes traditional approaches such as handcrafting rules or annotating corpora for training statistical classifiers difficult to apply in such context. In this article, we present an IE system based on a weakly supervised method for learning relation patterns. This method extracts without supervision occurrences of relations from a corpus and uses them as examples for learning relation patterns. We also present the results of the application of this system to the data of the 2010 Knowledge Base Population evaluation campaign.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, San Antonio, Texas, USA, pp. 85–94 (2000)

    Google Scholar 

  2. Agirre, E., Chang, A., Jurafsky, D., Manning, C., Spitkovsky, V., Yeh, E.: Stanford-UBC at TAC-KBP. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  3. Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008: HLT, Columbus, Ohio, pp. 28–36 (2008)

    Google Scholar 

  4. Bayardo, R., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: 16th International Conference on World Wide Web (WWW 2007), Banff, Alberta, Canada, pp. 131–140 (2007)

    Google Scholar 

  5. Bikel, D., Castelli, V., Radu, F., Jung Han, D.: Entity Linking and Slot Filling through Statistical Processing and Inference Rules. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  6. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 154–165 (2009)

    Article  Google Scholar 

  7. Bollegala, D., Matsuo, Y., Ishizuka, M.: WWW sits the SAT: Measuring Relational Similarity from the Web. In: 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece, pp. 333–337 (2008)

    Google Scholar 

  8. Byrne, L., Dunnion, J.: UCD IIRG at TAC 2010 KBP Slot Filling Task. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)

    Google Scholar 

  9. Chada, D., Aranha, C., Monte, C.: An Analysis of The Cortex Method at TAC 2010 KBP Slot-Filling. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)

    Google Scholar 

  10. Chen, Z., Tamang, S., Lee, A., Li, X., Passantino, M., Ji, H.: Top-Down and Bottom-Up: A Combined Approach to Slot Filling. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 300–309. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Chen, Z., Tamang, S., Lee, A., Li, X., Snover, M., Passantino, M., Lin, W.P., Ji, H.: CUNY-BLENDER TAC-KBP2010 Slot Filling System Description. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)

    Google Scholar 

  12. Claveau, V., Sébillot, P.: From efficiency to portability: acquisition of semantic relations by semi-supervised machine learning. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 261–267 (2004)

    Google Scholar 

  13. Embarek, M., Ferret, O.: Learning patterns for building resources about semantic relations in the medical domain. In: 6th Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  14. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: 25th International Conference on Very Large Data Bases (VLDB 1999), Edinburgh, Scotland, UK, pp. 518–529 (1999)

    Google Scholar 

  15. Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: 14th International Conference on Computational linguistics (COLING 1992), Nantes, France, pp. 539–545 (1992)

    Google Scholar 

  16. Ji, H., Grishman, R., Trang Dang, H.: Overview of the TAC 2010 Knowledge Base Population Track. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)

    Google Scholar 

  17. Li, F., Zheng, Z., Bu, F., Tang, Y., Zhu, X., Huang, M.: THU QUANTA at TAC 2009 KBP and RTE Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  18. Li, S., Gao, S., Zhang, Z., Li, X., Guan, J., Xu, W., Guo, J.: PRIS at TAC 2009: Experiments in KBP Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  19. McNamee, P., Dredze, M., Gerber, A., Garera, N., Finin, T., Mayfield, J., Piatko, C., Rao, D., Yarowsky, D., Dreyer, M.: HLTCOE Approaches to Knowledge Base Population at TAC 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  20. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL-IJCNLP 2009, Suntec, Singapore, pp. 1003–1011 (2009)

    Google Scholar 

  21. de Pablo-Sánchez, C., Perea, J., Segura-Bedmar, I., Martínez, P.: The UC3M team at the Knowledge Base Population task. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  22. Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale knowledge acquisition. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 771–777 (2004)

    Google Scholar 

  23. Ravichandran, D.: Terascale knowledge acquisition. Ph.D. thesis, Faculty of the Graduate School University of Southern California, Los Angeles, CA, USA (2005)

    Google Scholar 

  24. Riedel, S., Yao, L., McCallum, A.: Modeling Relations and Their Mentions without Labeled Text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia. Data Knowledge Engineering 61, 484–499 (2007)

    Article  Google Scholar 

  26. Schlaefer, N., Gieselmann, P., Schaaf, T., Waibel, A.: A Pattern Learning Approach to Question Answering Within the Ephyra Framework. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 687–694. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Schone, P., Goldschen, A., Langley, C., Lewis, S., Onyshkevych, B., Cutts, R., Dawson, B., MacBride, J., Matrangola, G., McDonough, C., Pfeifer, C., Ursiak, M.: TCAR at TAC-KBP 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)

    Google Scholar 

  28. Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: HLT-NAACL 2006, New York City, USA, pp. 304–311 (2006)

    Google Scholar 

  29. Stevenson, M.: Fact distribution in Information Extraction. Language Resources and Evaluation 40(2), 183–201 (2006)

    Article  MathSciNet  Google Scholar 

  30. Surdeanu, M., McClosky, D., Tibshirani, J., Bauer, J., Chang, A., Spitkovsky, V., Manning, C.: A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)

    Google Scholar 

  31. TAC-KBP: Preliminary task description for knowledge-base population at TAC 2010 (2010)

    Google Scholar 

  32. Wang, W., Besançon, R., Ferret, O., Grau, B.: Filtering and clustering relations for unsupervised information extraction in open domain. In: 20th ACM International Conference on Information and Knowledge Management (CIKM 2011), pp. 1405–1414 (2011)

    Google Scholar 

  33. Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, USA, pp. 427–434 (2005)

    Google Scholar 

  34. Zhou, G., Zhang, M., Ji, D., Zhu, Q.: Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: EMNLP - CoNLL 2007, Prague, Czech Republic, pp. 728–736 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jean-Louis, L., Besançon, R., Ferret, O., Durand, A. (2013). Using Distant Supervision for Extracting Relations on a Large Scale. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37186-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37185-1

  • Online ISBN: 978-3-642-37186-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics