Skip to main content

A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Abstract

Semantic relatedness computation is the task of quantifying the degree of relatedness of two concepts. The performance of existing approaches to computing semantic relatedness is highly dependent on particular aspects of relatedness. For instance, taxonomy-based approaches aim at computing similarity, which is a special case of semantic relatedness. On the other hand, corpus-based approaches focus on the associative relations of words by taking their distributional features into account. Based on the assumption that different aspects of knowledge sources cover different kinds of semantic relations, this paper presents a hybrid model for computing semantic relatedness of words using new features extracted from various aspects of Wikipedia. The focus of this paper is on finding the optimal feature combination(s) that enhance the performance of the hybrid model. The empirical evaluation on benchmark datasets has shown that hybrid features perform better than single features by providing a complementary coverage of semantic relations, leading to improved correlation with human judgments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 50–57 (1999)

    Google Scholar 

  2. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Schonhofen, P.: Identifying document topics using the wikipedia category network. In: Proceedings of the International Conference on Web Intelligence (WI 2006), pp. 456–462. IEEE Computer Society (2006)

    Google Scholar 

  4. Huang, A., Milne, D., Frank, E., Witten, I.H.: Clustering documents using a wikipedia-based concept representation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 628–636. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Yih, W., Qazvinian, V.: Measuring word relatedness using heterogeneous vector space models. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2012), pp. 616–620 (2012)

    Google Scholar 

  6. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32, 13–47 (2006)

    Article  MATH  Google Scholar 

  7. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2009), pp. 19–27 (2009)

    Google Scholar 

  8. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, pp. 25–30 (2008)

    Google Scholar 

  9. Navigli, R., Ponzetto, S.P.: Babelrelate! a joint multilingual approach to computing semantic relatedness. In: Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, AAAI 2012 (2012)

    Google Scholar 

  10. Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. on Knowl. and Data Eng. 23(7), 977–990 (2011)

    Article  Google Scholar 

  12. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), pp. 377–386 (2006)

    Google Scholar 

  13. Hassan, S., Banea, C., Mihalcea, R.: Measuring semantic relatedness using multilingual representations. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 20–29 (2012)

    Google Scholar 

  14. Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus: a lexical resource to treasure. CoRR (2012)

    Google Scholar 

  15. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1606–1611 (2007)

    Google Scholar 

  16. Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. (JAIR) 30, 181–212 (2007)

    Google Scholar 

  17. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI 2006), pp. 775–780 (2006)

    Google Scholar 

  18. Milne, D., Witten, I.H.: An open-source toolkit for mining wikipedia. Artificial Intelligence 194, 222–239 (2013); Artificial Intelligence, Wikipedia and Semi-Structured Resources.

    Google Scholar 

  19. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 233–242 (2007)

    Google Scholar 

  20. Jabeen, S., Gao, X., Andreae, P.: Directional Context Helps: Guiding Semantic Relatedness Computation by Asymmetric Word Associations. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 92–101. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 211–240 (1997)

    Google Scholar 

  22. Han, L., Finin, T., McNamee, P., Joshi, A., Yesha, Y.: Improving word similarity by augmenting pmi with estimates of word polysemy. IEEE Trans. Knowl. Data Eng. 25(6), 1307–1322 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jabeen, S., Gao, X., Andreae, P. (2014). A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_39

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics