A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features

Jabeen, Shahida; Gao, Xiaoying; Andreae, Peter

doi:10.1007/978-3-319-11749-2_39

A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features

Shahida Jabeen¹⁹,
Xiaoying Gao¹⁹ &
Peter Andreae¹⁹

Conference paper

1550 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Abstract

Semantic relatedness computation is the task of quantifying the degree of relatedness of two concepts. The performance of existing approaches to computing semantic relatedness is highly dependent on particular aspects of relatedness. For instance, taxonomy-based approaches aim at computing similarity, which is a special case of semantic relatedness. On the other hand, corpus-based approaches focus on the associative relations of words by taking their distributional features into account. Based on the assumption that different aspects of knowledge sources cover different kinds of semantic relations, this paper presents a hybrid model for computing semantic relatedness of words using new features extracted from various aspects of Wikipedia. The focus of this paper is on finding the optimal feature combination(s) that enhance the performance of the hybrid model. The empirical evaluation on benchmark datasets has shown that hybrid features perform better than single features by providing a complementary coverage of semantic relations, leading to improved correlation with human judgments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 50–57 (1999)
Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)
Chapter Google Scholar
Schonhofen, P.: Identifying document topics using the wikipedia category network. In: Proceedings of the International Conference on Web Intelligence (WI 2006), pp. 456–462. IEEE Computer Society (2006)
Google Scholar
Huang, A., Milne, D., Frank, E., Witten, I.H.: Clustering documents using a wikipedia-based concept representation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 628–636. Springer, Heidelberg (2009)
Chapter Google Scholar
Yih, W., Qazvinian, V.: Measuring word relatedness using heterogeneous vector space models. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2012), pp. 616–620 (2012)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32, 13–47 (2006)
Article MATH Google Scholar
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2009), pp. 19–27 (2009)
Google Scholar
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, pp. 25–30 (2008)
Google Scholar
Navigli, R., Ponzetto, S.P.: Babelrelate! a joint multilingual approach to computing semantic relatedness. In: Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, AAAI 2012 (2012)
Google Scholar
Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013)
Article MathSciNet MATH Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. on Knowl. and Data Eng. 23(7), 977–990 (2011)
Article Google Scholar
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), pp. 377–386 (2006)
Google Scholar
Hassan, S., Banea, C., Mihalcea, R.: Measuring semantic relatedness using multilingual representations. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 20–29 (2012)
Google Scholar
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus: a lexical resource to treasure. CoRR (2012)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1606–1611 (2007)
Google Scholar
Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. (JAIR) 30, 181–212 (2007)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI 2006), pp. 775–780 (2006)
Google Scholar
Milne, D., Witten, I.H.: An open-source toolkit for mining wikipedia. Artificial Intelligence 194, 222–239 (2013); Artificial Intelligence, Wikipedia and Semi-Structured Resources.
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 233–242 (2007)
Google Scholar
Jabeen, S., Gao, X., Andreae, P.: Directional Context Helps: Guiding Semantic Relatedness Computation by Asymmetric Word Associations. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 92–101. Springer, Heidelberg (2013)
Chapter Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 211–240 (1997)
Google Scholar
Han, L., Finin, T., McNamee, P., Joshi, A., Yesha, Y.: Improving word similarity by augmenting pmi with estimates of word polysemy. IEEE Trans. Knowl. Data Eng. 25(6), 1307–1322 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
Shahida Jabeen, Xiaoying Gao & Peter Andreae

Authors

Shahida Jabeen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Peter Andreae
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of New South Wales, Sydney, Australia
Boualem Benatallah
Boston University, Boston, MA, USA
Azer Bestavros
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos & Athena Vakali &
Victoria University, Footscray, VIC, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jabeen, S., Gao, X., Andreae, P. (2014). A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-11749-2_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics