A Generic Document Retrieval Framework Based on UMLS Similarity for Biomedical Question Answering System

  • Mourad SarroutiEmail author
  • Said Ouatik El Alaoui
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 57)


Biomedical document retrieval systems play a vital role in biomedical question answering systems. The performance of the latter depends directly on the performance of its biomedical document retrieval section. Indeed, the main goal of biomedical document retrieval is to find a set of citations that have high probability to contain the answers. In this paper, we propose a biomedical document retrieval framework to retrieve the relevant documents for the biomedical questions (queries) from the users. In our framework, we first use GoPubMed search engine to find the top-K results. Then, we re-rank the top-K results by computing the semantic similarity between questions and the title of each document using UMLS similarity. Our proposed framework is evaluated on the BioASQ 2014 task datasets. The experimental results show that our proposed framework has the best performance (MAP@100) compared to the existing state-of-the-art related document retrieval systems.


Information retrieval Biomedical question answering system Gopubmed Unified modeling language system Semantic similarity 



The authors would like to thank BioASK challenges [18] for providing us with benchmark datasets.


  1. 1.
    Abacha, A.B., Zweigenbaum, P.: Means: a medical question-answering system combining nlp techniques and semantic web technologies. Inf. Process. Manag. 51(5), 570–594 (2015)CrossRefGoogle Scholar
  2. 2.
    Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)Google Scholar
  3. 3.
    Athenikos, S.J., Han, H.: Biomedical question answering: a survey. Comput. Methods Programs Biomed. 99(1), 1–24 (2010)CrossRefGoogle Scholar
  4. 4.
    Balikas, G., Partalas, I., Ngomo, A.C.N., Krithara, A., Gaussier, E., Paliouras, G.: Results of the bioasq track of the question answering lab at clef 2014. Results of the BioASQ Track of the Question Answering Lab at CLEF 2014, 1181–1193 (2014)Google Scholar
  5. 5.
    Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucl. Acids Res. 32(suppl 1), D267–D270 (2004)CrossRefGoogle Scholar
  6. 6.
    Choi, S., Choi, J.: Classification and retrieval of biomedical literatures: Snumedinfo at clef qa track bioasq 2014. In: Proceedings of Question Answering Lab at CLEF (2014)Google Scholar
  7. 7.
    Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene ontology. Nucl. Acids Res. 33(suppl 2), W783–W786 (2005)CrossRefGoogle Scholar
  8. 8.
    Dwivedi, S.K., Singh, V.: Research and reviews in question answering system. Procedia Technol. 10, 417–424 (2013)CrossRefGoogle Scholar
  9. 9.
    Gupta, P., Gupta, V.: A survey of text question answering techniques. Int. J. Comput. Appl. 53(4), 1–8 (2012)Google Scholar
  10. 10.
    Lee, M., Cimino, J., Zhu, H.R., Sable, C., Shanker, V., Ely, J., Yu, H.: Beyond information retrieval medical question answering. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 469. American Medical Informatics Association (2006)Google Scholar
  11. 11.
    Loni, B.: A Survey of State-of-the-Art Methods on Question Classification, pp. 01–40. Delft University of Technology, Delft (2011)Google Scholar
  12. 12.
    McInnes, B.T., Pedersen, T., Pakhomov, S.V.: Umls-interface and umls-similarity: open source software for measuring paths and semantic similarity. In: AMIA Annual Symposium Proceedings, vol. 2009, p. 431. American Medical Informatics Association (2009)Google Scholar
  13. 13.
    Neves, M.: Hpi in-memory-based database system in task 2b of bioasq. In: Proceedings of Question Answering Lab at CLEF (2014)Google Scholar
  14. 14.
    Neves, M., Leser, U.: Question answering for biology. Methods 74, 36–46 (2015)CrossRefGoogle Scholar
  15. 15.
    Ryu, P.M., Jang, M.G., Kim, H.K.: Open domain question answering using wikipedia-based knowledge model. Inf. Process. Manag. 50(5), 683–692 (2014)CrossRefGoogle Scholar
  16. 16.
    Sarrouti, M., Lachkar, A., Ouatik, S.E.: Biomedical question types classification using syntactic and rule based approach. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 265–272 (2015)Google Scholar
  17. 17.
    Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis, vol. 2, pp. 2–6. Citeseer (2005)Google Scholar
  18. 18.
    Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M.R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., et al.: An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16(1), 138 (2015)CrossRefGoogle Scholar
  19. 19.
    Weissenborn, D., Tsatsaronis, G., Schroeder, M.: Answering factoid questions in the biomedical domain. BioASQ@ CLEF 1094 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Laboratory of Computer Science and ModelingFSDM, Sidi Mohammed Ben Abdellah UniversityFesMorocco

Personalised recommendations