Abstract
Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the knowledge available in domain ontologies (SNOMED-CT) and specific, closed and reliable corpuses (clinical data). However, in recent years, the enormous growth of the Web has motivated researchers to start using it as the base corpus to assist semantic analysis of language. This paper proposes and evaluates the use of the Web as background corpus for measuring the similarity of biomedical concepts. Several classical similarity measures have been considered and tested, using a benchmark composed by biomedical terms and comparing the results against approaches in which specific clinical data were used. Results shows that the similarity values obtained from the Web are even more reliable than those obtained from specific clinical data, manifesting the suitability of the Web as an information corpus for the biomedical domain.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bollegala, D., Matsuo, Y., Ishizuka, M.: WebSim: A Web-based Semantic Similarity Measure. In: The 21st Annual Conference of the Japanese Society for Artificial Intelligence, pp. 757–766 (2007)
Brill, E.: Processing Natural Language without Natural Language Processing. In: Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 360–369 (2003)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)
Burgun, A., Bodenreider, O.: Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System. In: Proceedings of the NAACL 2001 Workshop: WordNet and other lexical resources: Applications, extensions and customizations, Pittsburgh, PA, pp. 77–82 (2001)
Caviedes, J., Cimino, J.: Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics 37, 77–85 (2004)
Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis. In: Proceedings of Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115–164 (1991)
Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transaction on Know-ledge and Data Engineering 19(3), 370–383 (2006)
Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications. Springer, Heidelberg (2006)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Iosif, E., Potamianos, A.: Unsupervised Semantic Similarity Computation using Web Search Engines. In: Proceedings of the International Conference on Web Intelligence, pp. 381–387 (2007)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, Japan, pp. 19–33 (1997)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49(2), 188–207 (1993)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Miller, G., Leacock, C., Tengi, R., Bunker, R.T.: A Semantic Concordance. In: Proceedings of ARPA Workshop on Human Language Technology, Morristown, USA, pp. 303–308. Association for Computational Linguistics (1993)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Neugyn, H.A., Al-Mubaid, H.: New Ontology-based Semantic Similarity Measure for the Biomedical Domain. In: IEEE Conference on Granular Computing, Atlanta, GA, USA, pp. 623–628 (2006)
Pedersen, T., Pakhomov, S., Patwardhan, S., Chute, C.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence. Research 11, 95–130 (1999)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Sánchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowledge Engineering 63(3), 600–623 (2008)
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)
Wilbu, W., Yang, Y.: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine 26, 209–222 (1996)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez, D., Batet, M., Valls, A. (2009). Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In: Karagiannis, D., Jin, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2009. Lecture Notes in Computer Science(), vol 5914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10488-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-10488-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10487-9
Online ISBN: 978-3-642-10488-6
eBook Packages: Computer ScienceComputer Science (R0)