Learning Concept-Driven Document Embeddings for Medical Information Search

  • Gia-Hung NguyenEmail author
  • Lynda Tamine
  • Laure Soulier
  • Nathalie Souf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10259)


Many medical tasks such as self-diagnosis, health-care assessment, and clinical trial patient recruitment involve the usage of information access tools. A key underlying step to achieve such tasks is the document-to-document matching which mostly fails to bridge the gap identified between raw level representations of information in documents and high-level human interpretation. In this paper, we study how to optimize the document representation by leveraging neural-based approaches to capture latent representations built upon both validated medical concepts specified in an external resource as well as the used words. We experimentally show the effectiveness of our proposed model used as a support of two different medical search tasks, namely health search and clinical search for cohorts.


Medical information search Representation learning Knowledge resource Medical concepts 


  1. 1.
    Abdou, S., Savoy, J.: Searching in MEDLINE: query expansion and manual indexing evaluation. Inf. Process. Manag. 44(2), 781–789 (2008)CrossRefGoogle Scholar
  2. 2.
    Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)Google Scholar
  3. 3.
    Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Sun, J.: Multi-layer representation learning for medical concepts. In: KDD, pp. 1495–1504 (2016)Google Scholar
  4. 4.
    De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity with a neural language model. In: CIKM, pp. 1819–1822 (2014)Google Scholar
  5. 5.
    Dinh, D., Tamine, L.: Combining global and local semantic contexts for improving biomedical information retrieval. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 375–386. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-20161-5_38 CrossRefGoogle Scholar
  6. 6.
    Edinger, N.T., Cohen, A.M., Bedrick, S., Ambert, K., Hersh, W.: Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC medical records track. In: AMIA Annual Symposium, pp. 180–188 (2012)Google Scholar
  7. 7.
    Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: NAACL (2015)Google Scholar
  8. 8.
    Gobeill, J., Ruch, P., Zhou, X.: Query and document expansion with medical subject headings terms at medical Imageclef 2008. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 736–743. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04447-2_95 CrossRefGoogle Scholar
  9. 9.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)Google Scholar
  10. 10.
    Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: ACL, pp. 95–105 (2015)Google Scholar
  11. 11.
    Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Information retrieval as semantic inference: a graph inference model applied to medical search. Inf. Retrieval 19(1–2), 6–37 (2016)CrossRefGoogle Scholar
  12. 12.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  13. 13.
    Le, T.-D., Chevallet, J.-P., Dong, T.B.T.: Thesaurus-based query and document expansion in conceptual indexing with UMLS. In: RIVF 2007, pp. 242–246 (2007)Google Scholar
  14. 14.
    Lu, Z., Kim, W., Wilbur, W.J.: Evaluation of query expansion using MeSH in PubMed. Inf. Retrieval 12(1), 69–80 (2009)CrossRefGoogle Scholar
  15. 15.
    Mao, J., Lu, K., Mu, X., Li, G.: Mining document, concept, and term associations for effective biomedical retrieval: introducing MeSH-enhanced retrieval models. Inf. Retrieval 18(5), 413–444 (2015)CrossRefGoogle Scholar
  16. 16.
    Marton, C., Choo, C.W.: A review of theroretical models on health information seeking on the web. J. Documentation 68(3), 330–352 (2012)CrossRefGoogle Scholar
  17. 17.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781
  18. 18.
    Minarro-Gimenez, J., Marin-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health Technol. Inf. 205, 584–588 (2014)Google Scholar
  19. 19.
    Ni, Y., Xu, Q.K., Cao, F., Mass, Y., Sheinwald, D., Zhu, H.J., Cao, S.S.: Semantic documents relatedness using concept graph representation. In: WSDM (2016)Google Scholar
  20. 20.
    Pal, D., Mitra, M., Datta, K.: Improving query expansion using wordnet. JASIST 65(12), 2469–2478 (2014)Google Scholar
  21. 21.
    Rocchio, J.J.: Relevance feedback in information retrieval. In: The SMART Retrieval System, pp. 313–323 (1971)Google Scholar
  22. 22.
    Stokes, N., Cavedon, Y., Zobel, J.: Exploring criteria for succesful query expansion in the genomic domain. Inf. Retrieval 12, 17–50 (2009)CrossRefGoogle Scholar
  23. 23.
    Trieschnigg, D.: Proof of concept: concept-based biomedical information retrieval. Ph.D. thesis. University of Twente (2010)Google Scholar
  24. 24.
    Voorhees, E., Hersh, W.: Overview of the TREC medical records track. In: TREC (2012)Google Scholar
  25. 25.
    Wang, C., Akella, R.: Concept-based relevance models for medical and semantic information retrieval. In: CIKM, pp. 173–182 (2015)Google Scholar
  26. 26.
    Wang, S., Hauskrecht, M.: Effective query expansion with the resistance distance based term similarity metric. In: SIGIR, pp. 715–716 (2010)Google Scholar
  27. 27.
    Liu, X., Nie, J.-Y., Sordoni, A.: Constraining word embeddings by prior knowledge – application to medical information retrieval. In: Ma, S., Wen, J.-R., Liu, Y., Dou, Z., Zhang, M., Chang, Y., Zhao, X. (eds.) AIRS 2016. LNCS, vol. 9994, pp. 155–167. Springer, Cham (2016). doi: 10.1007/978-3-319-48051-0_12 CrossRefGoogle Scholar
  28. 28.
    Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.-Y.: Rc-net: a general framework for incorporating knowledge into word representations. In: CIKM (2014)Google Scholar
  29. 29.
    Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: ACL, pp. 545–550 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Gia-Hung Nguyen
    • 1
    Email author
  • Lynda Tamine
    • 1
  • Laure Soulier
    • 2
  • Nathalie Souf
    • 1
  1. 1.Université de Toulouse, UPS-IRITToulouseFrance
  2. 2.Sorbonne Universités-UPMC, Univ Paris 06, LIP6 UMR 7606ParisFrance

Personalised recommendations