Skip to main content

Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5914))

Abstract

Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the knowledge available in domain ontologies (SNOMED-CT) and specific, closed and reliable corpuses (clinical data). However, in recent years, the enormous growth of the Web has motivated researchers to start using it as the base corpus to assist semantic analysis of language. This paper proposes and evaluates the use of the Web as background corpus for measuring the similarity of biomedical concepts. Several classical similarity measures have been considered and tested, using a benchmark composed by biomedical terms and comparing the results against approaches in which specific clinical data were used. Results shows that the similarity values obtained from the Web are even more reliable than those obtained from specific clinical data, manifesting the suitability of the Web as an information corpus for the biomedical domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bollegala, D., Matsuo, Y., Ishizuka, M.: WebSim: A Web-based Semantic Similarity Measure. In: The 21st Annual Conference of the Japanese Society for Artificial Intelligence, pp. 757–766 (2007)

    Google Scholar 

  2. Brill, E.: Processing Natural Language without Natural Language Processing. In: Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 360–369 (2003)

    Google Scholar 

  3. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)

    Article  Google Scholar 

  4. Burgun, A., Bodenreider, O.: Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System. In: Proceedings of the NAACL 2001 Workshop: WordNet and other lexical resources: Applications, extensions and customizations, Pittsburgh, PA, pp. 77–82 (2001)

    Google Scholar 

  5. Caviedes, J., Cimino, J.: Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics 37, 77–85 (2004)

    Article  Google Scholar 

  6. Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using Statistics in Lexical Analysis. In: Proceedings of Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115–164 (1991)

    Google Scholar 

  7. Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transaction on Know-ledge and Data Engineering 19(3), 370–383 (2006)

    Article  Google Scholar 

  8. Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications. Springer, Heidelberg (2006)

    Google Scholar 

  9. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  10. Iosif, E., Potamianos, A.: Unsupervised Semantic Similarity Computation using Web Search Engines. In: Proceedings of the International Conference on Web Intelligence, pp. 381–387 (2007)

    Google Scholar 

  11. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, Japan, pp. 19–33 (1997)

    Google Scholar 

  12. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)

    Google Scholar 

  13. Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49(2), 188–207 (1993)

    Article  Google Scholar 

  14. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  15. Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)

    Article  Google Scholar 

  16. Miller, G., Leacock, C., Tengi, R., Bunker, R.T.: A Semantic Concordance. In: Proceedings of ARPA Workshop on Human Language Technology, Morristown, USA, pp. 303–308. Association for Computational Linguistics (1993)

    Google Scholar 

  17. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  18. Neugyn, H.A., Al-Mubaid, H.: New Ontology-based Semantic Similarity Measure for the Biomedical Domain. In: IEEE Conference on Granular Computing, Atlanta, GA, USA, pp. 623–628 (2006)

    Google Scholar 

  19. Pedersen, T., Pakhomov, S., Patwardhan, S., Chute, C.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)

    Article  Google Scholar 

  20. Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)

    Article  Google Scholar 

  21. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence. Research 11, 95–130 (1999)

    MATH  Google Scholar 

  22. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  23. Sánchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowledge Engineering 63(3), 600–623 (2008)

    Article  Google Scholar 

  24. Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  25. Wilbu, W., Yang, Y.: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine 26, 209–222 (1996)

    Article  Google Scholar 

  26. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sánchez, D., Batet, M., Valls, A. (2009). Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In: Karagiannis, D., Jin, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2009. Lecture Notes in Computer Science(), vol 5914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10488-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10488-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10487-9

  • Online ISBN: 978-3-642-10488-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics