Abstract
Linked Data has revamped the representation of knowledge by introducing the triple data structure which can encode knowledge with the associated semantics including the context by interlinking with external resources across documents. Although Linked Data is an attractive and effective mechanism to represent knowledge as created and consumed by humans in the form of a natural language, it still has a dimension of separation from natural language. Hence, in recent times, there has been an increase interest in transforming Linked Data into natural language in order to harness the benefits of Linked Data in applications interacting with natural language. This paper presents a framework that lexicalizes the Linked Data triples into natural language using an ensemble architecture. The proposed architecture is comprised of four different pattern based modules which lexicalize triples by analysing the triple features. The four pattern mining modules are based on occupational metonyms, Context Free Grammar (CFG), relation extraction using Open Information Extraction (OpenIE), and triple properties. The framework was evaluated using a two-fold evaluation process consisting of linguistic accuracy analysis and human evaluation for a test sample. The linguistic accuracy evaluation showed that the framework can produce 283 accurate lexicalization patterns for a set of 25 ontology classes resulting in a 70.75% accuracy, which is an approximately 91% increase compared to the existing state-of-the-art model.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Berners-Lee, T.: Linked Data Design Issues. Technical report, World Wide Web Consortium (W3C) (2006)
Ngomo, A., Auer, S., Lehmann, J., Zaveri, A.: Introduction to linked data and its lifecycle on the web. In: 7th International Conference on Reasoning Web: Semantic Technologies for the Web of Data. ACM (2014)
Perera, R., Nand, P., Klette, G.: Realtext-lex: a lexicalization framework for RDF triples. Prague Bull. Math. Linguist. 106(1), 45–68 (2016)
Perera, R., Nand, P.: RealText asg : a model to present answers utilizing the linguistic structure of source question. In: 29th Pacific Asia Conference on Language, Information and Computation (PACLIC). Association for Computational Linguistics (2015)
Perera, R., Nand, P.: Answer presentation in question answering over linked data using typed dependency subtree patterns. In: Open Knowledge Base and Question Answering Workshop collocated with 26th International Conference on Computational Linguistics (COLING), p. 44 (2016)
Bizer, C., Lehmann, J., Kobilarov, G.: DBpedia-a crystallization point for the Web of Data. Web Semant. 7(3), 154–165 (2009)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Kobilarov, G., Bizer, C., Auer, S., Lehmann, J.: DBpedia - a linked data hub and data source for web and enterprise applications. Int. World Wide Web Conf. 18, 1–3 (2009)
Panther, K., Thornburg, L.: A conceptual analysis of English-er nominals. Appl. Cogn. Linguist. 1, 149–200 (2002)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Prentice Hall PTR, Upper Saddle River (2000)
Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate Detection using Shallow Text Features. In: ACM International Conference on Web Search and Data Mining, pp. 441–450 (2010)
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Conference on Natural Language Learning, Portland. Association for Computational Linguistics (2011)
Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, pp. 523–534. ACL, July 2012
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction, pp. 355–366, May 2013
Walter, S., Unger, C., Cimiano, P.: A corpus-based approach for the induction of ontology lexica. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 102–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38824-8_9
Duma, D., Klein, E.: Generating natural language from linked data: unsupervised template extraction. In: 10th International Conference on Computational Semantics (IWCS 2013), Potsdam. ACL (2013)
Ell, B., Harth, A.: A language-independent method for the extraction of RDF verbalization templates. In: 8th International Natural Language Generation Conference, Philadelphia. ACL (2014)
Perera, R., Nand, P.: Interaction history based answer formulation for question answering. Commun. Comput. Inf. Sci. 468, 128–139 (2014)
Perera, R.: Scholar - cognitive computing approach for question answering. Ph.D. thesis, University of Westminster (2012)
Perera, R.: IPedagogy: question answering system based on web information clustering. In: Proceedings - 2012 IEEE 4th International Conference on Technology for Education, T4E 2012, Hyderabad, pp. 245–246. IEEE Press (2012)
Perera, R., Nand, P., Naeem, A.: Utilizing typed dependency subtree patterns for answer sentence generation in question answering systems. Prog. Artif. Intell. 6(2), 1–15 (2017)
Perera, R., Nand, P.: Generating lexicalization patterns for linked open data. In: NLP&LOD2 Collocated with 10th Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria. Association for Computational Linguistics (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Perera, R., Nand, P. (2018). An Ensemble Architecture for Linked Data Lexicalization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)