A Corpus-Based Approach for the Induction of Ontology Lexica
While there are many large knowledge bases (e.g. Freebase, Yago, DBpedia) as well as linked data sets available on the web, they typically lack lexical information stating how the properties and classes are realized lexically. If at all, typically only one label is attached to these properties, thus lacking any deeper syntactic information, e.g. about syntactic arguments and how these map to the semantic arguments of the property as well as about possible lexical variants or paraphrases. While there are lexicon models such as lemon allowing to define a lexicon for a given ontology, the cost involved in creating and maintaining such lexica is substantial, requiring a high manual effort. Towards lowering this effort, in this paper we present a semi-automatic approach that exploits a corpus to find occurrences in which a given property is expressed, and generalizing over these occurrences by extracting dependency paths that can be used as a basis to create lemon lexicon entries. We evaluate the resulting automatically generated lexica with respect to DBpedia as dataset and Wikipedia as corresponding corpus, both in an automatic mode, by comparing to a manually created lexicon, and in a semi-automatic mode in which a lexicon engineer inspected the results of the corpus-based approach, adding them to the existing lexicon if appropriate.
Keywordsontology lexicalization corpus-based approach lemon
Unable to display preview. Download preview PDF.
- 1.Akbik, A., Broß, J.: Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In: Proceedings of the Workshop on Semantic Search in Conjunction with the 18th Int. World Wide Web Conference (2009)Google Scholar
- 2.Bernstein, A., Kaufmann, E., Kaiser, C., Kiefer, C.: Ginseng: A guided input natural language search engine. In: Proceedings of the 15th Workshop on Information Technologies and Systems, pp. 45–50 (2005)Google Scholar
- 4.Bouayad-Agha, N., Casamayor, G., Wanner, L.: Natural language generation and semantic web technologies. Semantic Web Journal (in press)Google Scholar
- 5.Gerber, D., Ngomo, A.: Bootstrapping the linked data web. In: Proceedings of the 10th International Semantic Web Conference, ISWC (2011)Google Scholar
- 6.Ittoo, A., Bouma, G.: On learning subtypes of the part-whole relation: Do not mix your seeds. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1328–1336 (2010)Google Scholar
- 7.Ling, D., Pantel, P.: DIRT - discovery of inference rules of text. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–328. ACM (2001)Google Scholar
- 8.Lopez, V., Fernandez, M., Motta, E., Stieler, N.: Poweraqua: Supporting users in querying and exploring the semantic web. Semantic Web Journal, 249–265 (2012)Google Scholar
- 9.McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 10.Mellish, C., Sun, X.: The semantic web as a linguistic resource: opportunities for natural language generation. In: Proceedings of 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 298–303. Elsevier (2006)Google Scholar
- 11.Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 21st International Conference on Computational Linguistics (COLING), pp. 113–120. ACM (2006)Google Scholar
- 12.Third, A., Williams, S., Power, R.: OWL to english: a tool for generating organised easily-navigated hypertexts from ontologies. In: Proceedings of 10th International Semantic Web Conference (ISWC), pp. 298–303 (2011)Google Scholar
- 13.Unger, C., Bühmann, L., Lehmann, J., Ngonga-Ngomo, A.-C., Gerber, D., Cimiano, P.: Sparql template-based question answering. In: Proceedings of the World Wide Web Conference (WWW), pp. 639–648. ACM (2012)Google Scholar