Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain

Sánchez, David; Batet, Montserrat; Valls, Aida

doi:10.1007/978-3-642-10488-6_6

Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain

David Sánchez²¹,
Montserrat Batet²¹ &
Aida Valls²¹

Conference paper

1213 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5914))

Abstract

Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the knowledge available in domain ontologies (SNOMED-CT) and specific, closed and reliable corpuses (clinical data). However, in recent years, the enormous growth of the Web has motivated researchers to start using it as the base corpus to assist semantic analysis of language. This paper proposes and evaluates the use of the Web as background corpus for measuring the similarity of biomedical concepts. Several classical similarity measures have been considered and tested, using a benchmark composed by biomedical terms and comparing the results against approaches in which specific clinical data were used. Results shows that the similarity values obtained from the Web are even more reliable than those obtained from specific clinical data, manifesting the suitability of the Web as an information corpus for the biomedical domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bollegala, D., Matsuo, Y., Ishizuka, M.: WebSim: A Web-based Semantic Similarity Measure. In: The 21st Annual Conference of the Japanese Society for Artificial Intelligence, pp. 757–766 (2007)
Google Scholar
Brill, E.: Processing Natural Language without Natural Language Processing. In: Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 360–369 (2003)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)
Article Google Scholar
Burgun, A., Bodenreider, O.: Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System. In: Proceedings of the NAACL 2001 Workshop: WordNet and other lexical resources: Applications, extensions and customizations, Pittsburgh, PA, pp. 77–82 (2001)
Google Scholar
Caviedes, J., Cimino, J.: Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics 37, 77–85 (2004)
Article Google Scholar
Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis. In: Proceedings of Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115–164 (1991)
Google Scholar
Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transaction on Know-ledge and Data Engineering 19(3), 370–383 (2006)
Article Google Scholar
Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications. Springer, Heidelberg (2006)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Iosif, E., Potamianos, A.: Unsupervised Semantic Similarity Computation using Web Search Engines. In: Proceedings of the International Conference on Web Intelligence, pp. 381–387 (2007)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, Japan, pp. 19–33 (1997)
Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Google Scholar
Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49(2), 188–207 (1993)
Article Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Article Google Scholar
Miller, G., Leacock, C., Tengi, R., Bunker, R.T.: A Semantic Concordance. In: Proceedings of ARPA Workshop on Human Language Technology, Morristown, USA, pp. 303–308. Association for Computational Linguistics (1993)
Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Article Google Scholar
Neugyn, H.A., Al-Mubaid, H.: New Ontology-based Semantic Similarity Measure for the Biomedical Domain. In: IEEE Conference on Granular Computing, Atlanta, GA, USA, pp. 623–628 (2006)
Google Scholar
Pedersen, T., Pakhomov, S., Patwardhan, S., Chute, C.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)
Article Google Scholar
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)
Article Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence. Research 11, 95–130 (1999)
MATH Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Sánchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowledge Engineering 63(3), 600–623 (2008)
Article Google Scholar
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)
Chapter Google Scholar
Wilbu, W., Yang, Y.: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine 26, 209–222 (1996)
Article Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, University Rovira i Virgili, Av. Països Catalans, 26., 43007, Tarragona
David Sánchez, Montserrat Batet & Aida Valls

Authors

David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Montserrat Batet
View author publications
You can also search for this author in PubMed Google Scholar
Aida Valls
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science, Institute for Knowledge and Business Engineering, University of Vienna, Brünner Straße 72, 1210, Vienna, Austria
Dimitris Karagiannis
School of Electronic Engineering and Computer Science, Peking University, No. 5 Yiheyuan Road, 100871, Beijing, China
Zhi Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez, D., Batet, M., Valls, A. (2009). Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In: Karagiannis, D., Jin, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2009. Lecture Notes in Computer Science(), vol 5914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10488-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-10488-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10487-9
Online ISBN: 978-3-642-10488-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics