Abstract
Web-scale relation extraction is a means for building and extending large repositories of formalized knowledge. This type of automated knowledge building requires a decent level of precision, which is hard to achieve with automatically acquired rule sets learned from unlabeled data by means of distant or minimal supervision. This paper shows how precision of relation extraction can be considerably improved by employing a wide-coverage, general-purpose lexical semantic network, i.e., BabelNet, for effective semantic rule filtering. We apply Word Sense Disambiguation to the content words of the automatically extracted rules. As a result a set of relation-specific relevant concepts is obtained, and each of these concepts is then used to represent the structured semantics of the corresponding relation. The resulting relation-specific subgraphs of BabelNet are used as semantic filters for estimating the adequacy of the extracted rules. For the seven semantic relations tested here, the semantic filter consistently yields a higher precision at any relative recall value in the high-recall range.
Chapter PDF
Similar content being viewed by others
References
Agichtein, E.: Confidence estimation methods for partially supervised information extraction. In: Proc. of the Sixth SIAM International Conference on Data Mining (2006)
Ballesteros, M., Nivre, J.: Maltoptimizer: An optimization tool for maltparser. In: Proc. of EACL, pp. 58–62 (2012)
Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: Proc. of ACL/HLT, pp. 28–36 (2008)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: Proc. of the 20th IJCAI, pp. 2670–2676 (2007)
Betteridge, J., Carlson, A., Hong, S.A., Hruschka Jr., E.R., Law, E.L.M., Mitchell, T.M., Wang, S.H.: Toward never ending language learning. In: Proc. of the 2009 AAAI Spring Symposium on Learning by Reading and Learning to Read (2009)
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proc. of SIGMOD, pp. 1247–1250 (2008)
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E., Mitchell, T.: Toward an Architecture for Never-Ending Language Learning. In: Proc. of AAAI, pp. 1306–1313 (2010)
Carlson, A., Betteridge, J., Hruschka Jr., E.R., Mitchell, T.M.: Coupling semi-supervised learning of categories and relations. In: Proc. of the NAACL HLT 2009 Workskop on Semi-supervised Learning for Natural Language Processing (2009)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proc. of WSDM (2010)
Chan, Y.S., Roth, D.: Exploiting Syntactico-Semantic Structures for Relation Extraction. In: Proc. of ACL, pp. 551–560 (2011)
Chiarcos, C., Nordhoff, S., Hellmann, S.: Linked Data in Linguistics. Representing and Connecting Language Data and Language Metadata. Springer, Heidelberg (2012)
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: The Second Generation. In: Proc. of IJCAI, pp. 3–10 (2011)
Fader, A., Soderland, S., Etzioni, O.: Identifying Relations for Open Information Extraction. In: Proc. of EMNLP, pp. 1535–1545 (2011)
Fellbaum, C.: WordNet: an electronic lexical database, Cambridge, MA, USA (1998)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of ACL, pp. 363–370 (2005)
Grishman, R., Sundheim, B.: Message understanding conference - 6: A brief history. In: Proc. of the 16th International Conference on Computational Linguistics, Copenhagen (June 1996)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 28–61 (2013)
Jiang, J., Zhai, C.: A Systematic Exploration of the Feature Space for Relation Extraction. In: Proc. of NAACL, pp. 113–120 (2007)
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In: Proc. of ACL (Demonstration), pp. 178–181 (2004)
Kozareva, Z., Hovy, E.H.: A semi-supervised method to learn and construct taxonomies using the Web. In: Proc. of EMNLP, pp. 1110–1118 (2010)
Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012)
Lao, N., Mitchell, T., Cohen, W.W.: Random walk inference and learning in a large scale knowledge base. In: Proc. of EMNLP, pp. 529–539 (2011)
Miller, S., Fox, H., Ramshaw, L., Weischedel, R.: A Novel Use of Statistical Parsing to Extract Information from Text. In: Proc. of NAACL, pp. 226–233 (2000)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proc. of ACL/AFNLP, pp. 1003–1011 (2009)
Mohamed, T., Hruschka, E., Mitchell, T.: Discovering relations between noun categories. In: Proc. of EMNLP, pp. 1447–1455 (2011)
Moro, A., Navigli, R.: WiSeNet: building a wikipedia-based semantic network with ontologized relations. In: Proc. of CIKM, pp. 1672–1676 (2012)
Moro, A., Navigli, R.: Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm. In: Proc. of IJCAI, pp. 2148–2154 (2013)
Nastase, V., Strube, M.: Transforming Wikipedia into a large scale multilingual concept network. Artificial Intelligence 194, 62–85 (2013)
Navigli, R.: Word Sense Disambiguation: A survey. ACM Comput. Surv. 41(2), 1–69 (2009)
Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)
Navigli, R.: BabelNet goes to the (Multilingual) Semantic Web. In: Proc. of MSW (2012)
Navigli, R., Ponzetto, S.P.: Joining forces pays off: Multilingual Joint Word Sense Disambiguation. In: Proc. of EMNLP-CoNLL, pp. 1399–1410 (2012)
Navigli, R., Ponzetto, S.P.: Multilingual WSD with Just a Few Lines of Code: the BabelNet API. In: Proc. of ACL (System Demonstrations), pp. 67–72 (2012)
Nguyen, Q., Tikk, D., Leser, U.: Simple tricks for improving pattern-based information extraction from the biomedical literature. Journal of Biomedical Semantics 1(1) (2010)
Nguyen, T.V.T., Moschitti, A.: Joint distant and direct supervision for relation extraction. In: Proc. of 5th IJCNLP, pp. 732–740 (2011)
Parker, R.: English Gigaword, 5th edn. Linguistic Data Consortium. Philadelphia (2011)
Pasca, M., Lin, D., Bigham, J., Lifchits, A., Jain, A.: Names and Similarities on the Web: Fact Extraction in the Fast Lane. In: Proc. of ACL/COLING (2006)
Ravichandran, D., Hovy, E.H.: Learning surface text patterns for a Question Answering System. In: Proc. of ACL, pp. 41–47 (2002)
Shinyama, Y., Sekine, S.: Preemptive Information Extraction using Unrestricted Relation Discovery. In: Proc. of HLT-NAACL (2006)
Soderland, S., Roof, B., Qin, B., Xu, S., Mausam, E.O.: Adapting Open Information Extraction to Domain-Specific Relations. AI Magazine 31(3), 93–102 (2010)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A large ontology from Wikipedia and WordNet. J. Web. Semant. 6, 203–217 (2008)
Surdeanu, M., Ciaramita, M.: Robust information extraction with perceptrons. In: Proc. of the NIST 2007 Automatic Content Extraction Workshop, ACE 2007 (March 2007)
Surdeanu, M., Gupta, S., Bauer, J., McClosky, D., Chang, A.X., Spitkovsky, V.I., Manning, C.D.: Stanford’s distantly-supervised slot-filling system. In: Proc. of TAC (2011)
Uszkoreit, H.: Learning relation extraction grammars with minimal human intervention: Strategy, results, insights and plans. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 106–126. Springer, Heidelberg (2011)
Volokh, A., Neumann, G.: Comparing the benefit of different dependency parsers for textual entailment using syntactic constraints only. In: Proc. of SemEval, pp. 308–312 (2010)
Weld, D.S., Hoffmann, R., Wu, F.: Using Wikipedia to bootstrap open information extraction. SIGMOD Record 37, 62–68 (2008)
Wu, F., Weld, D.S.: Open Information Extraction Using Wikipedia. In: Proc. of ACL (2010)
Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from Wikipedia: moving down the long tail. In: Proc. of KDD, pp. 731–739 (2008)
Xu, F.: Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis, Saarland University (2007)
Xu, F., Uszkoreit, H., Krause, S., Li, H.: Boosting relation extraction with limited closed-world knowledge. In: Proc. of COLING (Posters), pp. 1354–1362 (2010)
Xu, F., Uszkoreit, H., Li, H.: A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In: Proc. of ACL (2007)
Xu, W., Grishman, R., Zhao, L.: Passage retrieval for information extraction using distant supervision. In: Proc. of IJCNLP, pp. 1046–1054 (2011)
Yangarber, R.: Counter-training in discovery of semantic patterns. In: Proc. of ACL (2003)
Yangarber, R., Grishman, R., Tapanainen, P.: Automatic acquisition of domain knowledge for information extraction. In: Proc. of COLING, pp. 940–946 (2000)
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the Web. In: Proc. of HLT-NAACL (Demo), pp. 25–26 (2007)
Yates, A., Etzioni, O.: Unsupervised Resolution of Objects and Relations on the Web. In: Proc. of HLT-NAACL, pp. 121–130 (2007)
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. The Journal of Machine Learning Research 3, 1083–1106 (2003)
Zhou, G., Qian, L., Fan, J.: Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf. Sci. 180(8), 1313–1325 (2010)
Zhou, G., Zhang, M.: Extracting relation information from text documents by exploring various types of knowledge. Inf. Process. Manage. 43(4), 969–982 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moro, A., Li, H., Krause, S., Xu, F., Navigli, R., Uszkoreit, H. (2013). Semantic Rule Filtering for Web-Scale Relation Extraction. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8218. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41335-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-41335-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41334-6
Online ISBN: 978-3-642-41335-3
eBook Packages: Computer ScienceComputer Science (R0)