Abstract
Wikipedia provides rich semantic features e.g., text, link, and category structure. These features can be used to compute semantic similarity (SS) between words or concepts. However, some existing Wikipedia-based SS methods either rely on a single feature or do not incorporate the underlying statistics of different features. We propose novel vector representations of Wikipedia concepts by integrating their multiple semantic features. We utilize the available statistics of these features in Wikipedia to compute their weights. These weights signify the contribution of each feature in similarity evaluation according to its level of importance. The experimental evaluation shows that our new methods obtain better results on SS datasets in comparison with state-of-the-art SS methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies, pp. 19–27 (2009)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611 (2007)
Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Hussain, M.J., Wasti, S.H., Huang, G., Wei, L., Jiang, Y., Tang, Y.: An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances. Inf. Process. Manag. 57(3), 102188 (2020)
Jiang, Y., Bai, W., Zhang, X., Hu, J.: Wikipedia-based information content and semantic similarity computation. Inf. Process. Manag. 53(1), 248–265 (2017)
Jiang, Y., Zhang, X., Tang, Y., Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf. Process. Manag. 51(3), 215–234 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Processes 6(1), 1–28 (1991)
Qu, R., Fang, Y., Bai, W., Jiang, Y.: Computing semantic similarity based on novel models of semantic representation using Wikipedia. Inf. Process. Manag. 54(6), 1002–1021 (2018)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Wasti, S.H., Hussain, M.J., Huang, G., Akram, A., Jiang, Y., Tang, Y.: Assessing semantic similarity between concepts: a weighted-feature-based approach. Concurr. Comput.: Pract. Exp. 32(7), e5594 (2020)
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)
Acknowledgments
This work is supported by The National Natural Science Foundation of China under Grant Nos. 61772210 and U1911201; Guangdong Province Universities Pearl River Scholar Funded Scheme (2018); The Project of Science and Technology in Guangzhou in China under Grant No. 201807010043.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Wasti, S., Hussain, J., Huang, G., Jiang, Y. (2020). Similarity Evaluation with Wikipedia Features. In: Shi, Z., Vadera, S., Chang, E. (eds) Intelligent Information Processing X. IIP 2020. IFIP Advances in Information and Communication Technology, vol 581. Springer, Cham. https://doi.org/10.1007/978-3-030-46931-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-46931-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46930-6
Online ISBN: 978-3-030-46931-3
eBook Packages: Computer ScienceComputer Science (R0)