Extracting Multilingual Natural-Language Patterns for RDF Predicates

Gerber, Daniel; Ngomo, Axel-Cyrille Ngonga

doi:10.1007/978-3-642-33876-2_10

Extracting Multilingual Natural-Language Patterns for RDF Predicates

Daniel Gerber²⁵ &
Axel-Cyrille Ngonga Ngomo²⁵

Conference paper

1927 Accesses
34 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7603))

Abstract

Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for extracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are then used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi independent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, S., Lehmann, J., Ngonga Ngomo, A.-C.: Introduction to Linked Data and Its Lifecycle on the Web. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 1–75. Springer, Heidelberg (2011)
Chapter Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545. ACL (2011)
Google Scholar
Finkel, J.R., Manning, C.D.: Hierarchical joint learning: improving joint parsing and named entity recognition with non-jointly labeled data. In: ACL 2010, pp. 720–728 (2010)
Google Scholar
Gaag, A., Kohn, A., Lindemann, U.: Function-based solution retrieval and semantic search in mechanical engineering. In: IDEC 2009, pp. 147–158 (2009)
Google Scholar
Gerber, D., Ngonga Ngomo, A.-C.: Bootstrapping the linked data web. In: 1st Workshop on Web Scale Knowledge Extraction ISWC (2011)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: ROCLING X, p. 9008 (September 1997)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: SemEval 2010 (2010)
Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: I-SEMANTICS, pp. 1–8. ACM (2011)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL, pp. 1003–1011 (2009)
Google Scholar
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, Hong Kong, pp. 227–236 (2011)
Google Scholar
Ngonga Ngomo, A.-C., Heino, N., Lyko, K., Speck, R., Kaltenböck, M.: SCMS – Semantifying Content Management Systems. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 189–204. Springer, Heidelberg (2011)
Chapter Google Scholar
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proc. of ECAI, vol. 4, pp. 1089–1090 (2004)
Google Scholar
Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.-C., Gerber, D., Cimiano, P.: Sparql template-based question answering. In: Proceedings of WWW (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, AKSW, Universität Leipzig, Postfach 100920, D-04009, Leipzig, Germany
Daniel Gerber & Axel-Cyrille Ngonga Ngomo

Authors

Daniel Gerber
View author publications
You can also search for this author in PubMed Google Scholar
Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vrije Universiteit, Amsterdam, The Netherlands
Annette ten Teije
Institute of Computer Science and Business Informatics, University of Mannheim, Germany
Johanna Völker & Heiner Stuckenschmidt &
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Siegfried Handschuh
Knowledge Media Institute, The Open University, Milton Keynes, UK
Mathieu d’Acquin & Andriy Nikolov &
Institut de Recherche en Informatique, Université de Toulouse, 118, route de Narbonne, 31062, Toulouse Cedex 4, France
Nathalie Aussenac-Gilles
Université de Toulouse, France
Nathalie Hernandez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gerber, D., Ngomo, AC.N. (2012). Extracting Multilingual Natural-Language Patterns for RDF Predicates. In: ten Teije, A., et al. Knowledge Engineering and Knowledge Management. EKAW 2012. Lecture Notes in Computer Science(), vol 7603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33876-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-33876-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33875-5
Online ISBN: 978-3-642-33876-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics