Skip to main content

Advertisement

Log in

A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

As a valuable tool for text understanding, semantic similarity measurement enables discriminative semantic-based applications in the fields of natural language processing, information retrieval, computational linguistics and artificial intelligence. Most of the existing studies have used structured taxonomies such as WordNet to explore the lexical semantic relationship, however, the improvement of computation accuracy is still a challenge for them. To address this problem, in this paper, we propose a hybrid WordNet-based approach CSSM-ICSP to measuring concept semantic similarity, which leverage the information content(IC) of concepts to weight the shortest path distance between concepts. To improve the performance of IC computation, we also develop a novel model of the intrinsic IC of concepts, where a variety of semantic properties involved in the structure of WordNet are taken into consideration. In addition, we summarize and classify the technical characteristics of previous WordNet-based approaches, as well as evaluate our approach against these approaches on various benchmarks. The experimental results of the proposed approaches are more correlated with human judgment of similarity in term of the correlation coefficient, which indicates that our IC model and similarity detection approach are comparable or even better for semantic similarity measurement as compared to others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09 (pp. 19–27). Stroudsburg, PA, USA: Association for Computational Linguistics.

    Google Scholar 

  • Banerjee, S., & Pedersen, T. (2003). Extended gloss overlaps as a measure of semantic relatedness. In Proceeding of International Joint Conference on Artificial Intelligence, (Vol. 3 pp. 805–810).

    Google Scholar 

  • Bogdanović, M., Stanimirović, A., & Stoimenov, L. (2015). Methodology for geospatial data source discovery in ontology-driven geo-information integration architectures. Journal of Web Semantics, 32, 1–15.

    Article  Google Scholar 

  • Bouras, C., & Tsogkas, V. (2012). A clustering technique for news articles using wordnet. Knowledge-Based Systems, 36, 115–128. doi:10.1016/j.knosys.2012.06.015.

    Article  Google Scholar 

  • Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2002). Placing search in context: the concept revisited. ACM Transactions on Information Systems, 20(1), 116–131. doi:10.1145/503104.503110.

    Article  Google Scholar 

  • Formica, A. (2009). Concept similarity by evaluating information contents and feature vectors: a combined approach. Communications of the ACM, 52(3), 145–149. doi:10.1145/1467247.1467281.

    Article  Google Scholar 

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In International Joint Conference on Artificial Intelligence (pp. 1606–1611).

    Google Scholar 

  • Gao, J., Zhang, B., & Chen, X. (2015). A wordnet-based semantic similarity measurement combining edge-counting and information content theory. Engineering Applications of Artificial Intelligence, 39, 80–88. doi:10.1016/j.engappai.2014.11.009.

    Article  Google Scholar 

  • Hirst, G., & Budanitsky, A. (2005). Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering, 11(1), 87–111.

    Article  Google Scholar 

  • Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database (pp. 305–332): MIT Press.

  • Jiang, J.J., & Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th International Conference Research on Computational Linguistics. Taiwan.

  • Leacock, C., & Chodrow, M. (1998). Combining local context and wordnet similarity for word sense identification. In Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database (pp. 265–283): MIT Press.

  • Li, Y., Bandar, Z., & McLean, S. (2003). An approach for measuring semantic similarity between words using multiple information sources. Transactions on Data and Knowledge Engineering, 15(4), 871–882.

    Article  Google Scholar 

  • Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning ICML. Madison, Wisconsin.

  • Liu, H., Bao, H., & Xu, D. (2012). Concept vector for semantic similarity and relatedness based on wordnet structure. Journal of Systems and Software, 85(2), 370–381.

    Article  Google Scholar 

  • Lu, W., Cai, Y., Che, X., & Shi, K. (2015). Semantic similarity assessment using differential evolution algorithm in continuous vector space. Journal of Visual Languages & Computing, 31, 246–251.

    Article  Google Scholar 

  • Lu, W., Shi, K., Cai, Y., & Che, X. (2016). Semantic similarity measurement using knowledge-augmented multiple-prototype distributed word vector. International Journal of Interdisciplinary Telecommunications & Networking, 8(2), 45–57.

    Article  Google Scholar 

  • Lu, W., Cai, Y., Che, X., & Lu, Y. (2016). Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery. Personal and Ubiquitous Computing, 20(3), 311–323.

    Article  Google Scholar 

  • Meng, L., Gu, J., & Zhou, Z. (2012). A new model of information content based on concept’s topology for measuring semantic similarity in wordnet. International Journal of Grid & Distributed Computing, 5(3), 81–94.

    Google Scholar 

  • Meng, L., Huang, R., & Gu, J. (2013). An effective algorithm for semantic similarity metric of word pairs International Journal of Multimedia and Ubiquitous Engineering, 8(2).

  • Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Miller, G.A., & Charles, W.G. (1991). Contextual correlates of semantic similarity. Language & Cognitive Processes, 6(1), 1–28.

    Article  Google Scholar 

  • Miller, T., Biemann, C., Zesch, T., & Gurevych, I. (2012). Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation. In Proceedings of the 24th International Conference on Computational Linguistics COLING (pp. 1781–1796). Mumbai, India.

  • Paliwal, A.V., Shafiq, B., Vaidya, J., Xiong, H., & Adam, N.R. (2012). Semantics-based automated service discovery. IEEE Transactions on Services Computing, 5(2), 260–275.

    Article  Google Scholar 

  • Patwardhan, S. (2003). Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Duluth: Master’s thesis, University of Minnesota.

    Google Scholar 

  • Patwardhan, S., & Pedersen, T. (2006). Using wordnet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 Workshop Making Sense of Sense-Bringing Computational Linguistics and Psycholinguistics Together, (Vol. 1501 pp. 1–8).

  • Pekar, V., & Staab, S. (2002). Taxonomy learning - factoring the structure of a taxonomy into a semantic classification decision. In Proceeding of the 19th International Conference on Computational LinguisticsCOLING. Taipei, Taiwan.

  • Pesaranghader, A., & Muthaiyah, S. (2013). Definition-based information content vectors for semantic similarity measurement. Communications in Computer & Information Science, 378, 268–282.

    Article  Google Scholar 

  • Pesaranghader, A., Rezaei, A., & Pesaranghader, A. (2013). Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain Springer International Publishing.

  • Petrakis, E.G., Varelas, G., Hliaoutakis, A., & Raftopoulou, P. (2006). X-similarity: computing semantic similarity between concepts from different ontologies. Journal of Digital Information Management, 4(4), 233–237.

    Google Scholar 

  • Pirró, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 68(11), 1289–1308.

    Article  Google Scholar 

  • Pirró, G., & Seco, N. (2008). Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content, chap. On the Move to Meaningful Internet Systems: OTM 2008 Vol. 5332. Berlin, Heidelberg: Springer.

  • Piskorski, J., & Sydow, M. (2007). String distance metrics for reference matching and search query correction. In Business Information Systems, International Conference, Bis 2007 (pp. 353–365). Poznan, Poland: Proceedings.

  • Piskorski, J., Wieloch, K., & Sydow, M. (2009). On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval Journal, 12(3), 275–299.

    Article  Google Scholar 

  • Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1), 17–30. doi:10.1109/21.24528.

    Article  Google Scholar 

  • Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI (pp. 448–453). Canada: Montréal Québec.

  • Richardson, R., Smeaton, A., & Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. In Proceedings of AICS Conference. Dublin: Trinity College.

    Google Scholar 

  • Rodríguez, M.A., & Egenhofer, M. J. (2003). Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering, 15(2), 442–456.

    Article  Google Scholar 

  • Ross, S.M. (2002). A First course in probability, 6th edn. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Rubenstein, H., & Goodenough, J.B. (1965). Contextual correlates of synonymy. Communcation of the ACM, 8(10), 627–633.

    Article  Google Scholar 

  • Rybiski, M., & Montes, J.F.A. (2017). Domesa: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics. Journal of Intelligent Information Systems (pp. 1–17).

  • Sánchez, D., & Batet, M. (2011). Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of biomedical informatics, 44(5), 749–759.

    Article  Google Scholar 

  • Sánchez, D., Batet, M., & Isern, D. (2011). Ontology-based information content computation. Knowledge-Based Systems, 24(2), 297–303.

    Article  Google Scholar 

  • Sánchez, D., Batet, M., Isern, D., & Valls, A. (2012). Ontology-based semantic similarity: A new feature-based approach. Expert System Application, 39(9), 7718–7728.

    Article  Google Scholar 

  • Sánchez, D., Solé-Ribalta, A., Batet, M., & Serratosa, F. (2012). Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. Journal of Biomedical Informatics, 45(1), 141–155.

    Article  Google Scholar 

  • Seco, N., Veale, T., Hayes, J., De Mántaras, R.L., & Saitta, L. (2004). An intrinsic information content metric for semantic similarity in wordnet. In Proceedings of the 16th Eureopean Conference on Artificial Intelligence ECAI (pp. 1089–1090). Valencia, Spain: IOS Press.

  • Simonoff, J.S. (1996). Smoothing methods in statistics. Springer.

  • Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.

    Article  Google Scholar 

  • Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using wordnet and lexical chains. Expert System Application, 42(4), 2264–2275. doi:10.1016/j.eswa.2014.10.023.

    Article  Google Scholar 

  • Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceeding of the 32nd annual meeting on Association for Computational Linguistics (pp. 133–138). doi:10.3115/981732.981751

  • Yih, W., He, X., & Meek, C. (2014). Semantic parsing for single-relation question answering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 643–648).

  • Zhou, Z., Wang, Y., & Gu, J. (2008a). A new model of information content for semantic similarity in wordnet. In Proceedings of the 2nd International Conference on Future Generation Communication and Networking Symposia FGCNS (pp. 85–89). Hainan Island, China: Sanya.

  • Zhou, Z., Wang, Y., & Gu, J. (2008b). New model of semantic similarity measuring in wordnet. In Proceedings of 3rd International Conference on Intelligent System and Knowledge Engineering (pp. 256–261).

Download references

Acknowledgements

The authors would like to thank the reviewers for their valuable comments and suggestions. This study is supported by National Natural Science Foundation of China (No.61502028), National Key Technology R&D Program of China (No. 2015BAK36B04), Training program foundation for the talents of Beijing (No.2015000020124G029), the Beijing Natural Science Foundation (No. 4172014) and the Research Foundation for Youth Scholars of Beijing Technology and Business University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingchuan Zhang.

Ethics declarations

I certify that this manuscript is original and has not been published and will not be submitted elsewhere for publication while being considered by Journal of Intelligent Information Systems. And the study is not split up into several parts to increase the quantity of submissions and submitted to various journals or one journal over time. No data have been fabricated or manipulated (including images) to support our conclusions. No data, text, or theories by others are presented as if they were our own. The submission has been received explicitly from all co-authors. And authors whose names appear on the manuscript have contributed sufficiently to the scientific work and therefore share collective responsibility and accountability for the results. In addition, consent to submit has been received explicitly from all co-authors, as well as from the responsible authorities - tacitly or explicitly - at the institute where the work has been carried out, before the work is submitted. Authors are strongly advised to ensure the correct author group, corresponding author, and order of authors at submission.

Conflict of interests

The authors declare that they have no conflict of interest.

Funding

This study is funded by National Natural Science Foundation of China (No.61502028), National Key Technology R&D Program of China (No.2015BAK36B04), Training program foundation for the talents of Beijing (No.2015000020124G029), the Beijing Natural Science Foundation (No. 4172014) and the Research Foundation for Youth Scholars of Beijing Technology and Business University.

Research involving Human Participants and/or Animals

There is no human participants or animals involved in this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, Y., Zhang, Q., Lu, W. et al. A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet. J Intell Inf Syst 51, 23–47 (2018). https://doi.org/10.1007/s10844-017-0479-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0479-y

Keywords

Navigation