Using Distant Supervision for Extracting Relations on a Large Scale

Jean-Louis, Ludovic; Besançon, Romaric; Ferret, Olivier; Durand, Adrien

doi:10.1007/978-3-642-37186-8_9

Using Distant Supervision for Extracting Relations on a Large Scale

Ludovic Jean-Louis⁵,
Romaric Besançon⁵,
Olivier Ferret⁵ &
…
Adrien Durand⁵

Conference paper

1165 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Abstract

Most of Information Extraction (IE) systems are designed for extracting a restricted number of relations in a specific domain. Recent work about Web-scale knowledge extraction has changed this perspective by introducing large-scale IE systems. Such systems are open-domain and characterized by a large number of relations, which makes traditional approaches such as handcrafting rules or annotating corpora for training statistical classifiers difficult to apply in such context. In this article, we present an IE system based on a weakly supervised method for learning relation patterns. This method extracts without supervision occurrences of relations from a corpus and uses them as examples for learning relation patterns. We also present the results of the application of this system to the data of the 2010 Knowledge Base Population evaluation campaign.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, San Antonio, Texas, USA, pp. 85–94 (2000)
Google Scholar
Agirre, E., Chang, A., Jurafsky, D., Manning, C., Spitkovsky, V., Yeh, E.: Stanford-UBC at TAC-KBP. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008: HLT, Columbus, Ohio, pp. 28–36 (2008)
Google Scholar
Bayardo, R., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: 16th International Conference on World Wide Web (WWW 2007), Banff, Alberta, Canada, pp. 131–140 (2007)
Google Scholar
Bikel, D., Castelli, V., Radu, F., Jung Han, D.: Entity Linking and Slot Filling through Statistical Processing and Inference Rules. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 154–165 (2009)
Article Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: WWW sits the SAT: Measuring Relational Similarity from the Web. In: 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece, pp. 333–337 (2008)
Google Scholar
Byrne, L., Dunnion, J.: UCD IIRG at TAC 2010 KBP Slot Filling Task. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Google Scholar
Chada, D., Aranha, C., Monte, C.: An Analysis of The Cortex Method at TAC 2010 KBP Slot-Filling. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Google Scholar
Chen, Z., Tamang, S., Lee, A., Li, X., Passantino, M., Ji, H.: Top-Down and Bottom-Up: A Combined Approach to Slot Filling. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 300–309. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, Z., Tamang, S., Lee, A., Li, X., Snover, M., Passantino, M., Lin, W.P., Ji, H.: CUNY-BLENDER TAC-KBP2010 Slot Filling System Description. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Google Scholar
Claveau, V., Sébillot, P.: From efficiency to portability: acquisition of semantic relations by semi-supervised machine learning. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 261–267 (2004)
Google Scholar
Embarek, M., Ferret, O.: Learning patterns for building resources about semantic relations in the medical domain. In: 6th Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: 25th International Conference on Very Large Data Bases (VLDB 1999), Edinburgh, Scotland, UK, pp. 518–529 (1999)
Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: 14th International Conference on Computational linguistics (COLING 1992), Nantes, France, pp. 539–545 (1992)
Google Scholar
Ji, H., Grishman, R., Trang Dang, H.: Overview of the TAC 2010 Knowledge Base Population Track. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Google Scholar
Li, F., Zheng, Z., Bu, F., Tang, Y., Zhu, X., Huang, M.: THU QUANTA at TAC 2009 KBP and RTE Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Li, S., Gao, S., Zhang, Z., Li, X., Guan, J., Xu, W., Guo, J.: PRIS at TAC 2009: Experiments in KBP Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
McNamee, P., Dredze, M., Gerber, A., Garera, N., Finin, T., Mayfield, J., Piatko, C., Rao, D., Yarowsky, D., Dreyer, M.: HLTCOE Approaches to Knowledge Base Population at TAC 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL-IJCNLP 2009, Suntec, Singapore, pp. 1003–1011 (2009)
Google Scholar
de Pablo-Sánchez, C., Perea, J., Segura-Bedmar, I., Martínez, P.: The UC3M team at the Knowledge Base Population task. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale knowledge acquisition. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 771–777 (2004)
Google Scholar
Ravichandran, D.: Terascale knowledge acquisition. Ph.D. thesis, Faculty of the Graduate School University of Southern California, Los Angeles, CA, USA (2005)
Google Scholar
Riedel, S., Yao, L., McCallum, A.: Modeling Relations and Their Mentions without Labeled Text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010)
Chapter Google Scholar
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia. Data Knowledge Engineering 61, 484–499 (2007)
Article Google Scholar
Schlaefer, N., Gieselmann, P., Schaaf, T., Waibel, A.: A Pattern Learning Approach to Question Answering Within the Ephyra Framework. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 687–694. Springer, Heidelberg (2006)
Chapter Google Scholar
Schone, P., Goldschen, A., Langley, C., Lewis, S., Onyshkevych, B., Cutts, R., Dawson, B., MacBride, J., Matrangola, G., McDonough, C., Pfeifer, C., Ursiak, M.: TCAR at TAC-KBP 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Google Scholar
Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: HLT-NAACL 2006, New York City, USA, pp. 304–311 (2006)
Google Scholar
Stevenson, M.: Fact distribution in Information Extraction. Language Resources and Evaluation 40(2), 183–201 (2006)
Article MathSciNet Google Scholar
Surdeanu, M., McClosky, D., Tibshirani, J., Bauer, J., Chang, A., Spitkovsky, V., Manning, C.: A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Google Scholar
TAC-KBP: Preliminary task description for knowledge-base population at TAC 2010 (2010)
Google Scholar
Wang, W., Besançon, R., Ferret, O., Grau, B.: Filtering and clustering relations for unsupervised information extraction in open domain. In: 20th ACM International Conference on Information and Knowledge Management (CIKM 2011), pp. 1405–1414 (2011)
Google Scholar
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, USA, pp. 427–434 (2005)
Google Scholar
Zhou, G., Zhang, M., Ji, D., Zhu, Q.: Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: EMNLP - CoNLL 2007, Prague, Czech Republic, pp. 728–736 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

CEA LIST, Vision and Content Engineering Laboratory, Gif-sur-Yvette, F-91191, France
Ludovic Jean-Louis, Romaric Besançon, Olivier Ferret & Adrien Durand

Authors

Ludovic Jean-Louis
View author publications
You can also search for this author in PubMed Google Scholar
Romaric Besançon
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Ferret
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Durand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
INSTICC and IPS, Estefanilha, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jean-Louis, L., Besançon, R., Ferret, O., Durand, A. (2013). Using Distant Supervision for Extracting Relations on a Large Scale. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-37186-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics