Abstract
Text attributes in databases contain rich semantic information that is seldom processed or used. This paper proposes a method to extract and semantically represent concepts from texts stored in databases. This process relies on tools such as WordNet and Wikipedia to identify concepts extracted from texts and represent them as a basic ontology whose concepts are annotated with search terms. This ontology can play diverse roles. It can be seen as a conceptual summary of the content of an attribute, which can be used as a means to navigate through the textual content of an attribute. It can also be used as a profile for text search using the terms associated to the ontology concepts. The ontology is built as a subset of Wikipedia category graph, selected using diverse metrics. Category selection using these metrics is discussed and an example application is presented and evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)
Campaña, J.R., Martín-Bautista, M., Medina, J., Vila, M.: Semantic Enrichment of Database Textual Attributes. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 488–499. Springer, Heidelberg (2009)
Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th International Conference on Research in Computational Linguistics, Taipei, Taiwan, pp. 19–33 (1997)
Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification, ch. 11, pp. 265–283. The MIT Press, Cambridge (1998)
Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proc. 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Marín, N., Martín-Bautista, M.J., Prados, M., Vila, M.A.: Enhancing Short Text Retrieval in Databases. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 613–624. Springer, Heidelberg (2006)
Martín-Bautista, M.J., Martínez-Folgoso, S., Vila, M.A.: A New Semantic Representation for Short Texts. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 347–356. Springer, Heidelberg (2008)
Mountford, M.D.: An index of similarity and its application to classification problems. In: Murphy, P.W. (ed.), pp. 43–50 (1962)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19(1), 17–30 (1989)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI-2004, vol. 16, pp. 1089–1090 (2004)
Sorensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity ofspecies content and its application to analyses of the vegetation on danish commons. Dan. Vidensk. Selsk. Biol. Skr. (5), 1–34 (1948)
Torsten Zesch, I.G.: Analysis of the wikipedia category graph for nlp applications, pp.1–8 (2007)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Morristown (1994)
Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering 16(01), 25–59 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Campaña, J.R., Medina, J.M., Vila, M.A. (2011). Semantic Processing of Database Textual Attributes Using Wikipedia. In: Christiansen, H., De Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2011. Lecture Notes in Computer Science(), vol 7022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24764-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-24764-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24763-7
Online ISBN: 978-3-642-24764-4
eBook Packages: Computer ScienceComputer Science (R0)