Semantic Processing of Database Textual Attributes Using Wikipedia

Campaña, Jesús R.; Medina, Juan M.; Vila, M. Amparo

doi:10.1007/978-3-642-24764-4_8

Jesús R. Campaña²⁵,
Juan M. Medina²⁵ &
M. Amparo Vila²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7022))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

638 Accesses
3 Citations

Abstract

Text attributes in databases contain rich semantic information that is seldom processed or used. This paper proposes a method to extract and semantically represent concepts from texts stored in databases. This process relies on tools such as WordNet and Wikipedia to identify concepts extracted from texts and represent them as a basic ontology whose concepts are annotated with search terms. This ontology can play diverse roles. It can be seen as a conceptual summary of the content of an attribute, which can be used as a means to navigate through the textual content of an attribute. It can also be used as a profile for text search using the terms associated to the ontology concepts. The ontology is built as a subset of Wikipedia category graph, selected using diverse metrics. Category selection using these metrics is discussed and an example application is presented and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)
Chapter Google Scholar
Campaña, J.R., Martín-Bautista, M., Medina, J., Vila, M.: Semantic Enrichment of Database Textual Attributes. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 488–499. Springer, Heidelberg (2009)
Chapter Google Scholar
Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Article Google Scholar
Jaccard, P.: Etude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th International Conference on Research in Computational Linguistics, Taipei, Taiwan, pp. 19–33 (1997)
Google Scholar
Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification, ch. 11, pp. 265–283. The MIT Press, Cambridge (1998)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proc. 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Marín, N., Martín-Bautista, M.J., Prados, M., Vila, M.A.: Enhancing Short Text Retrieval in Databases. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 613–624. Springer, Heidelberg (2006)
Chapter Google Scholar
Martín-Bautista, M.J., Martínez-Folgoso, S., Vila, M.A.: A New Semantic Representation for Short Texts. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 347–356. Springer, Heidelberg (2008)
Chapter Google Scholar
Mountford, M.D.: An index of similarity and its application to classification problems. In: Murphy, P.W. (ed.), pp. 43–50 (1962)
Google Scholar
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19(1), 17–30 (1989)
Article Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI-2004, vol. 16, pp. 1089–1090 (2004)
Google Scholar
Sorensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity ofspecies content and its application to analyses of the vegetation on danish commons. Dan. Vidensk. Selsk. Biol. Skr. (5), 1–34 (1948)
Google Scholar
Torsten Zesch, I.G.: Analysis of the wikipedia category graph for nlp applications, pp.1–8 (2007)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Morristown (1994)
Chapter Google Scholar
Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering 16(01), 25–59 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Artificial Intelligence, University of Granada, Daniel Saucedo Aranda s/n, 18071, Granada, Spain
Jesús R. Campaña, Juan M. Medina & M. Amparo Vila

Authors

Jesús R. Campaña
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Medina
View author publications
You can also search for this author in PubMed Google Scholar
M. Amparo Vila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Communication, Business and Information Technologies, Roskilde University, P.O. box 260, 4000, Roskilde, Denmark
Henning Christiansen
Department of Telecommunications and Information Processing, Ghent University, Sint-Pietersnieuwstraat 41, 9000, Ghent, Belgium
Guy De Tré
Computer Engineering Department, Middle East Technical University (METU), 06531, Ankara, Türkiye
Adnan Yazici
Systems Research Institute, Polish Academy of Science, Newelska 6, 01-447, Warsaw, Poland
Slawomir Zadrozny
Department of Computer Science, Roskilde University, Building 42.1, P.O Box 260, 4000, Roskilde, Denmark
Troels Andreasen
Department of Electronic Systems, Aalborg University, Niels Bohrs Vey 8, H321, 6700, Esbjerg, Denmark
Henrik Legind Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campaña, J.R., Medina, J.M., Vila, M.A. (2011). Semantic Processing of Database Textual Attributes Using Wikipedia. In: Christiansen, H., De Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2011. Lecture Notes in Computer Science(), vol 7022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24764-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-24764-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24763-7
Online ISBN: 978-3-642-24764-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics