Distributed Representation of Healthcare Text Through Qualitative and Quantitative Analysis

  • J. R. NaveenEmail author
  • H. B. Barathi Ganesh
  • M. Anand Kumar
  • K. P. Soman
Conference paper
Part of the Lecture Notes in Computational Vision and Biomechanics book series (LNCVB, volume 31)


Many healthcare-related applications use pretrained embeddings, but these are often trained over general corpus which is mostly downstreamed to certain particular application. One problem noticed among such embeddings is that these are not efficient across various health text applications and even less number of research describe evaluation of these embedding for health domain. In this paper, distributional embedding model is performed to acquire a word representation on data crawled from Journal of Medical Case Reports. This distributed embedding model is analyzed qualitatively and quantitatively over crawled corpus. Qualitative evaluation is employed by cosine similarity on different categories and is visually represented. Quantitative evaluation performed with parts of speech tagging and entity recognition. The embedding model attained a cross-validation accuracy of 91.70% in parts of speech tagging for GENIA corpus and ensured 83% accuracy in the entity recognition of i2b2 clinical data.


  1. 1.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119Google Scholar
  2. 2.
    Pakhomov SVS, Finley G, McEwan R, Wang Y, Melton GB (2016) Corpus domain effects on distributional semantic modeling of medical terms. Bioinformatics 32(23):3635–3644Google Scholar
  3. 3.
    Barathi Ganesh HB, Anand Kumar M, Soman KP (2018) From vector space models to vector space models of semantics. In: Forum for information retrieval evaluation. Springer, Berlin, pp 50–60CrossRefGoogle Scholar
  4. 4.
    Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S et al (2017) Clinical information extraction applications: a literature review. J Biomed InfGoogle Scholar
  5. 5.
    Tang B, Cao H, Wang X, Chen Q, Xu H (2014) Evaluating word representation features in biomedical named entity recognition tasks. BioMed Res IntGoogle Scholar
  6. 6.
    Jagannatha A, Chen J, Yu H (2015) Mining and ranking biomedical synonym candidates from Wikipedia. In: Proceedings of the sixth international workshop on health text mining and information analysisGoogle Scholar
  7. 7.
    Gurulingappa H, Toldo L, Schepers C, Bauer A, Megaro G (2016) Semi-supervised information retrieval system for clinical decision support. In: TRECGoogle Scholar
  8. 8.
    Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), vol 1, pp 238–247Google Scholar
  9. 9.
    Landauer TK, Dumais ST (1997) A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211CrossRefGoogle Scholar
  10. 10.
    Turney PD (2008) A uniform approach to analogies, synonyms, antonyms, and associations. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 905–912Google Scholar
  11. 11.
    Barathi Ganesh HB, Reshma U, Anand Kumar M, Soman KP (2017) Amrita\_cen\_nlp@ irled 2017Google Scholar
  12. 12.
    Ghannay S, Favre B, Esteve Y, Camelin N (2016) Word embedding evaluation and combination. In: LREC, pp 300–305Google Scholar
  13. 13.
    Barathi Ganesh HB, Anand Kumar M, Soman KP (2016) Distributional semantic representation in health care text classification. In: International conference on forum of information retrieval and evaluation, pp 201–204Google Scholar
  14. 14.
    Hinton GE, McClelland JL, Rumelhart DE et al (1986) Distributed representations. Parallel Distrib Process Explor Microstruct Cogn 1(3):77–109Google Scholar
  15. 15.
    Uzuner Ö, South BR, Shen S, DuVall SL (2011) 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J Am Med Inf Assoc 18(5):552–556CrossRefGoogle Scholar
  16. 16.
    Verspoor K, Cohen KB, Lanfranchi A, Warner C, Johnson HL, Roeder C, Choi JD, Funk C, Malenkiy Y, Eckert M et al (2012) A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinf 13(1):207CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • J. R. Naveen
    • 1
    Email author
  • H. B. Barathi Ganesh
    • 1
    • 2
  • M. Anand Kumar
    • 1
  • K. P. Soman
    • 1
  1. 1.Amrita School of EngineeringCenter for Computational Engineering and Networking (CEN), Amrita Vishwa VidyapeethamCoimbatoreIndia
  2. 2.Arnekt Solution Pvt. Ltd.Magarpatta, PuneIndia

Personalised recommendations