Abstract
Most of Information Extraction (IE) systems are designed for extracting a restricted number of relations in a specific domain. Recent work about Web-scale knowledge extraction has changed this perspective by introducing large-scale IE systems. Such systems are open-domain and characterized by a large number of relations, which makes traditional approaches such as handcrafting rules or annotating corpora for training statistical classifiers difficult to apply in such context. In this article, we present an IE system based on a weakly supervised method for learning relation patterns. This method extracts without supervision occurrences of relations from a corpus and uses them as examples for learning relation patterns. We also present the results of the application of this system to the data of the 2010 Knowledge Base Population evaluation campaign.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, San Antonio, Texas, USA, pp. 85–94 (2000)
Agirre, E., Chang, A., Jurafsky, D., Manning, C., Spitkovsky, V., Yeh, E.: Stanford-UBC at TAC-KBP. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008: HLT, Columbus, Ohio, pp. 28–36 (2008)
Bayardo, R., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: 16th International Conference on World Wide Web (WWW 2007), Banff, Alberta, Canada, pp. 131–140 (2007)
Bikel, D., Castelli, V., Radu, F., Jung Han, D.: Entity Linking and Slot Filling through Statistical Processing and Inference Rules. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 154–165 (2009)
Bollegala, D., Matsuo, Y., Ishizuka, M.: WWW sits the SAT: Measuring Relational Similarity from the Web. In: 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece, pp. 333–337 (2008)
Byrne, L., Dunnion, J.: UCD IIRG at TAC 2010 KBP Slot Filling Task. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Chada, D., Aranha, C., Monte, C.: An Analysis of The Cortex Method at TAC 2010 KBP Slot-Filling. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Chen, Z., Tamang, S., Lee, A., Li, X., Passantino, M., Ji, H.: Top-Down and Bottom-Up: A Combined Approach to Slot Filling. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 300–309. Springer, Heidelberg (2010)
Chen, Z., Tamang, S., Lee, A., Li, X., Snover, M., Passantino, M., Lin, W.P., Ji, H.: CUNY-BLENDER TAC-KBP2010 Slot Filling System Description. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Claveau, V., Sébillot, P.: From efficiency to portability: acquisition of semantic relations by semi-supervised machine learning. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 261–267 (2004)
Embarek, M., Ferret, O.: Learning patterns for building resources about semantic relations in the medical domain. In: 6th Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: 25th International Conference on Very Large Data Bases (VLDB 1999), Edinburgh, Scotland, UK, pp. 518–529 (1999)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: 14th International Conference on Computational linguistics (COLING 1992), Nantes, France, pp. 539–545 (1992)
Ji, H., Grishman, R., Trang Dang, H.: Overview of the TAC 2010 Knowledge Base Population Track. In: Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
Li, F., Zheng, Z., Bu, F., Tang, Y., Zhu, X., Huang, M.: THU QUANTA at TAC 2009 KBP and RTE Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Li, S., Gao, S., Zhang, Z., Li, X., Guan, J., Xu, W., Guo, J.: PRIS at TAC 2009: Experiments in KBP Track. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
McNamee, P., Dredze, M., Gerber, A., Garera, N., Finin, T., Mayfield, J., Piatko, C., Rao, D., Yarowsky, D., Dreyer, M.: HLTCOE Approaches to Knowledge Base Population at TAC 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL-IJCNLP 2009, Suntec, Singapore, pp. 1003–1011 (2009)
de Pablo-Sánchez, C., Perea, J., Segura-Bedmar, I., Martínez, P.: The UC3M team at the Knowledge Base Population task. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale knowledge acquisition. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 771–777 (2004)
Ravichandran, D.: Terascale knowledge acquisition. Ph.D. thesis, Faculty of the Graduate School University of Southern California, Los Angeles, CA, USA (2005)
Riedel, S., Yao, L., McCallum, A.: Modeling Relations and Their Mentions without Labeled Text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia. Data Knowledge Engineering 61, 484–499 (2007)
Schlaefer, N., Gieselmann, P., Schaaf, T., Waibel, A.: A Pattern Learning Approach to Question Answering Within the Ephyra Framework. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 687–694. Springer, Heidelberg (2006)
Schone, P., Goldschen, A., Langley, C., Lewis, S., Onyshkevych, B., Cutts, R., Dawson, B., MacBride, J., Matrangola, G., McDonough, C., Pfeifer, C., Ursiak, M.: TCAR at TAC-KBP 2009. In: Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA (2009)
Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: HLT-NAACL 2006, New York City, USA, pp. 304–311 (2006)
Stevenson, M.: Fact distribution in Information Extraction. Language Resources and Evaluation 40(2), 183–201 (2006)
Surdeanu, M., McClosky, D., Tibshirani, J., Bauer, J., Chang, A., Spitkovsky, V., Manning, C.: A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task. In: Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA (2010)
TAC-KBP: Preliminary task description for knowledge-base population at TAC 2010 (2010)
Wang, W., Besançon, R., Ferret, O., Grau, B.: Filtering and clustering relations for unsupervised information extraction in open domain. In: 20th ACM International Conference on Information and Knowledge Management (CIKM 2011), pp. 1405–1414 (2011)
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, USA, pp. 427–434 (2005)
Zhou, G., Zhang, M., Ji, D., Zhu, Q.: Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: EMNLP - CoNLL 2007, Prague, Czech Republic, pp. 728–736 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jean-Louis, L., Besançon, R., Ferret, O., Durand, A. (2013). Using Distant Supervision for Extracting Relations on a Large Scale. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-37186-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)