Skip to main content

Semantic Processing of Database Textual Attributes Using Wikipedia

  • Conference paper
Flexible Query Answering Systems (FQAS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7022))

Included in the following conference series:

Abstract

Text attributes in databases contain rich semantic information that is seldom processed or used. This paper proposes a method to extract and semantically represent concepts from texts stored in databases. This process relies on tools such as WordNet and Wikipedia to identify concepts extracted from texts and represent them as a basic ontology whose concepts are annotated with search terms. This ontology can play diverse roles. It can be seen as a conceptual summary of the content of an attribute, which can be used as a means to navigate through the textual content of an attribute. It can also be used as a profile for text search using the terms associated to the ontology concepts. The ontology is built as a subset of Wikipedia category graph, selected using diverse metrics. Category selection using these metrics is discussed and an example application is presented and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Campaña, J.R., Martín-Bautista, M., Medina, J., Vila, M.: Semantic Enrichment of Database Textual Attributes. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 488–499. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  4. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)

    Article  Google Scholar 

  5. Jaccard, P.: Etude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)

    Google Scholar 

  6. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th International Conference on Research in Computational Linguistics, Taipei, Taiwan, pp. 19–33 (1997)

    Google Scholar 

  7. Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification, ch. 11, pp. 265–283. The MIT Press, Cambridge (1998)

    Google Scholar 

  8. Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proc. 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  9. Marín, N., Martín-Bautista, M.J., Prados, M., Vila, M.A.: Enhancing Short Text Retrieval in Databases. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 613–624. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Martín-Bautista, M.J., Martínez-Folgoso, S., Vila, M.A.: A New Semantic Representation for Short Texts. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 347–356. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Mountford, M.D.: An index of similarity and its application to classification problems. In: Murphy, P.W. (ed.), pp. 43–50 (1962)

    Google Scholar 

  12. Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19(1), 17–30 (1989)

    Article  Google Scholar 

  13. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  14. Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI-2004, vol. 16, pp. 1089–1090 (2004)

    Google Scholar 

  15. Sorensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity ofspecies content and its application to analyses of the vegetation on danish commons. Dan. Vidensk. Selsk. Biol. Skr. (5), 1–34 (1948)

    Google Scholar 

  16. Torsten Zesch, I.G.: Analysis of the wikipedia category graph for nlp applications, pp.1–8 (2007)

    Google Scholar 

  17. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)

    Google Scholar 

  18. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Morristown (1994)

    Chapter  Google Scholar 

  19. Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering 16(01), 25–59 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Campaña, J.R., Medina, J.M., Vila, M.A. (2011). Semantic Processing of Database Textual Attributes Using Wikipedia. In: Christiansen, H., De Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2011. Lecture Notes in Computer Science(), vol 7022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24764-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24764-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24763-7

  • Online ISBN: 978-3-642-24764-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics