Abstract
Knowledge graphs have become vital resources for semantic search and provide users with precise answers to their information needs. Knowledge graphs often consist of billions of facts, typically encoded in the form of RDF triples. In most cases, these facts are extracted automatically and can thus be susceptible to errors. For many applications, it can therefore be very useful to complement knowledge graph facts with textual evidence. For instance, it can help users make informed decisions about the validity of the facts that are returned as part of an answer to a query. In this paper, we therefore propose , an approach that given a knowledge graph and a text corpus, retrieves the top-k most relevant textual passages for a given set of facts. Since our goal is to retrieve short passages, we develop a set of IR models combining exact matching through the Okapi BM25 model with semantic matching using word embeddings. To evaluate our approach, we built an extensive benchmark consisting of facts extracted from YAGO and text passages retrieved from Wikipedia. Our experimental results demonstrate the effectiveness of our approach in retrieving textual evidence for knowledge graph facts.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bhatia, S., Dwivedi, P., Kaur, A.: Tell me why is it so? Explaining knowledge graph relationships by finding descriptive support passages. In: ISWC (2018)
Brokos, G.I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. In: ACL, p. 114 (2016)
Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: ACL, pp. 100–109 (2013)
Daciuk, J., Mihov, S., Watson, B.W., Watson, R.E.: Incremental construction of minimal acyclic finite-state automata. Comput. Linguist. 26(1), 3–16 (2000)
Elbassuoni, S., Hose, K., Metzger, S., Schenkel, R.: ROXXI: Reviving witness dOcuments to eXplore eXtracted Information. PVLDB 3(2), 1589–1592 (2010)
Gad-Elrab, M., Stepanova, D., Urbani, J., Weikum, G.: ExFaKT: a framework for explaining facts over knowledge graphs and text. In: WSDM (2019)
Galke, L., Saleh, A., Scherp, A.: Word embeddings for practical information retrieval. In: INFORMATIK (2017)
Gerber, D., et al.: Defacto - temporal and multilingual deep fact validation. Web Semant. 35, 85–101 (2015)
Gerber, D., Ngomo, A.C.N.: Bootstrapping the linked data web. In: Workshop on Web Scale Knowledge Extraction (2011)
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: CIKM, pp. 701–710 (2016)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM TOIS 20(4), 422–446 (2002)
Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: Part 2. Inf. Process. Manage. 36(6), 809–840 (2000)
Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: CIKM, pp. 1411–1420 (2015)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: ICML, pp. 957–966 (2015)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)
Metzger, S., Elbassuoni, S., Hose, K., Schenkel, R.: S3K: seeking statement-supporting top-K witnesses. In: CIKM, pp. 37–46 (2011)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: EMNLP (2012)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC. vol. Special Publication 500–225, pp. 109–126. National Institute of Standards and Technology (NIST) (1994)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Vulić, I., Moens, M.F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: SIGIR, pp. 363–372 (2015)
Zhou, G., He, T., Zhao, J., Hu, P.: Learning continuous word embedding with metadata for question retrieval in community question answering. In: ACL, vol. 1, pp. 250–259 (2015)
Acknowledgments
This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B and Aalborg University’s Talent Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ercan, G., Elbassuoni, S., Hose, K. (2019). Retrieving Textual Evidence for Knowledge Graph Facts. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-21348-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)