Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text

  • Armin SajadiEmail author
  • Evangelos E. Milios
  • Vlado Kešelj
  • Jeannette C. M. Janssen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)


Wikipedia is becoming an important knowledge source in various domain specific applications based on concept representation. This introduces the need for concrete evaluation of Wikipedia as a foundation for computing semantic relatedness between concepts. While lexical resources like WordNet cover generic English well, they are weak in their coverage of domain specific terms and named entities, which is one of the strengths of Wikipedia. Furthermore, semantic relatedness methods that rely on the hierarchical structure of a lexical resource are not directly applicable to the Wikipedia link structure, which is not hierarchical and whose links do not capture well defined semantic relationships like hyponymy.

In this paper we (1) Evaluate Wikipedia in a domain specific semantic relatedness task and demonstrate that Wikipedia based methods can be competitive with state of the art ontology based methods and distributional methods in the biomedical domain (2) Adapt and evaluate the effectiveness of bibliometric methods of various degrees of sophistication on Wikipedia (3) Propose a new graph-based method for calculating semantic relatedness that outperforms existing methods by considering some specific features of Wikipedia structure.


Semantic Similarity Semantic Relatedness Distributional Method Neighborhood Graph Computational Linguistics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, Association for Computational Linguistics, Stroudsburg (2009), Google Scholar
  2. 2.
    Agirre, E., Cer, D., Diab, M., Gonzalez-agirre, A., Guo, W.: SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (2013)Google Scholar
  3. 3.
    Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. JAMIA 17(3), 229–236 (2010), Google Scholar
  4. 4.
    Budanitsky, A.: Lexical Semantic Relatedness and its Application in Natural Language Processing. Ph.D. thesis, University of Toronto, Toronto, Ontario (1999)Google Scholar
  5. 5.
    Christensen, D.: Fast algorithms for the calculation of Kendall’s τ. Computational Statistics 20(1), 51–62 (2005), CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370–383 (2007), CrossRefGoogle Scholar
  7. 7.
    Couto, T., Cristo, M., Gonçalves, M.A., Calado, P., Ziviani, N., Moura, E., Ribeiro-Neto, B.: A comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2006, pp. 75–84. ACM, New York (2006),
  8. 8.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics, Philadelphia (2003)Google Scholar
  9. 9.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM, New York (2001),
  10. 10.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007),
  11. 11.
    Garla, V., Brandt, C.: Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC Bioinformatics 13(1), 1–13 (2012)Google Scholar
  12. 12.
    Golub, G.H., van der Vorst, H.A.: Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics 123(1-2), 35–65 (2000); numerical Analysis 2000. Vol. III: Linear Algebra,
  13. 13.
    Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 192–201. Springer-Verlag New York, Inc., New York (1994),
  14. 14.
    Hjrland, B.: Citation analysis: A social and dynamic approach to knowledge organization. Information Processing & Management 49(6), 1313–1325 (2013), CrossRefGoogle Scholar
  15. 15.
    Hughes, T., Ramage, D.: Lexical semantic relatedness with random graph walks. In: EMNLP-CoNLL, pp. 581–589 (2007)Google Scholar
  16. 16.
    Jabeen, S., Gao, X., Andreae, P.: CPRel: Semantic relatedness computation using wikipedia based context profiles. In: Research in Computing Science, vol. 70, pp. 55–66 (2013)Google Scholar
  17. 17.
    Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 538–543. ACM, New York (2002)Google Scholar
  18. 18.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2439–2442. ACM, New York (2012),
  20. 20.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) pp. 305–332. MIT Press (1998)Google Scholar
  21. 21.
    Lu, W., Janssen, J., Milios, E., Japkowicz, N., Zhang, Y.: Node similarity in the citation graph. Knowledge and Information Systems 11(1), 105–129 (2007), CrossRefGoogle Scholar
  22. 22.
    McInnes, B.T., Pedersen, T., Pakhomov, S.V.: UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. In: AMIA Annual Symposium Proc. 2009, pp. 431–435 (2009)Google Scholar
  23. 23.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013),
  24. 24.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  25. 25.
    Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of AAAI 2008 (2008)Google Scholar
  26. 26.
    Nguyen, H., Al-Mubaid, H.: New ontology-based semantic similarity measure for the biomedical domain. In: 2006 IEEE International Conference on Granular Computing, pp. 623–628 (2006)Google Scholar
  27. 27.
    Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.B.: Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. In: AMIA Annu. Symp. Proc. 2010, pp. 572–576 (2010)Google Scholar
  28. 28.
    Pakhomov, S.V.S., Pedersen, T., McInnes, B., Melton, G.B., Ruggieri, A., Chute, C.G.: Towards a framework for developing semantic relatedness reference standards. J. of Biomedical Informatics 44(2), 251–265 (2011)CrossRefGoogle Scholar
  29. 29.
    Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)CrossRefGoogle Scholar
  30. 30.
    Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res (JAIR) 30, 181–212 (2007)zbMATHGoogle Scholar
  31. 31.
    Sánchez, D., Batet, M.: Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. J. of Biomedical Informatics 44(5), 749–759 (2011), CrossRefGoogle Scholar
  32. 32.
    Senellart, P., Blondel, V.D.: Automatic discovery of similar words. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining II: Clustering, Classification and Retrieval, pp. 25–44. Springer-Verlag (January 2008)Google Scholar
  33. 33.
    Symonds, M., Zuccon, G., Koopman, B., Bruza, P.D., Nguyen, A.: Semantic judgement of medical concepts: combining syntagmatic and paradigmatic information with the tensor encoding model. In: Australasian Language Technology Association Workshop (ALTA 2012). University of Otago, Dunedin (December 2012),
  34. 34.
    Yang, B., Heines, J.M.: Domain-specific semantic relatedness from Wikipedia: can a course be transferred? In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, NAACL HLT 2012, pp. 35–40. Association for Computational Linguistics, Stroudsburg (2012),
  35. 35.
    Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013), CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Yeh, E., Ramage, D., Manning, C.D.: Wikiwalk: random walks on Wikipedia for semantic relatedness. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-4, pp. 41–49. Association for Computational Linguistics, Stroudsburg (2009)CrossRefGoogle Scholar
  37. 37.
    Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 553–562. ACM, New York (2009)Google Scholar
  38. 38.
    Zou, G.Y.: Toward using confidence intervals to compare correlations. Psychological Methods 12(4), 399–413 (2007), CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Armin Sajadi
    • 1
    Email author
  • Evangelos E. Milios
    • 1
  • Vlado Kešelj
    • 1
  • Jeannette C. M. Janssen
    • 1
  1. 1.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations