Skip to main content
Log in

Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator

  • Patient Facing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. H.E.S.S. Committee, And the G.E.T. Force, Electronic Health Records, A Global Perspective, 2010.

  2. Electronic Health Record Standards For India Helpdesk | National Health Portal Of India, (n.d.). http://www.nhp.gov.in/ehr-standards-helpdesk_ms (accessed May 12, 2016).

  3. Debra, D., Sullivan, guide to clinical documentation, 2nd edn. F. A, Davis Company, Philadelphia, 2004.

    Google Scholar 

  4. Anthes, A.M., Harinstein, L.M., Smithburger, P.L., Seybert, A.L., and Kane-Gill, S.L., Improving adverse drug event detection in critically ill patients through screening intensive care unit transfer summaries. Pharmacoepidemiol. Drug Saf. 22:510–516, 2013. doi:10.1002/pds.3422.

    Article  PubMed  Google Scholar 

  5. Constant, E., Garin, H., Bouchet, C., and Kohler, F., Differences of case-mix according to the type of hospital: methodological aspects and results. Stud. Health Technol. Inform. 52(Pt 2):874–878 , 1998.http://www.ncbi.nlm.nih.gov/pubmed/10384586 (accessed May 12, 2016)

    PubMed  Google Scholar 

  6. Kind, A.J.H., Thorpe, C.T., Sattin, J.A., Walz, S.E., and Smith, M.A., Provider characteristics, clinical-work processes and their relationship to discharge summary quality for sub-acute care patients. J. Gen. Intern. Med. 27:78–84, 2012. doi:10.1007/s11606-011-1860-0.

    Article  PubMed  Google Scholar 

  7. M. Skouroliakou, G. Soloupis, A. Gounaris, A. Charitou, P. Papasarantopoulos, S.L. Markantonis, C. Golna, K. Souliotis, Data analysis of the benefits of an electronic registry of information in a neonatal intensive care unit in Greece., Perspect. Health Inf. Manag. 5 (2008) 10. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2508737&tool=pmcentrez&rendertype=abstract (accessed May 12, 2016).

  8. Blair, D.R., Lyttle, C.S., Mortensen, J.M., Bearden, C.F., Jensen, A.B., Khiabanian, H., Melamed, R., Rabadan, R., Bernstam, E.V., Brunak, S., Jensen, L.J., Nicolae, D., Shah, N.H., Grossman, R.L., Cox, N.J., White, K.P., and Rzhetsky, A., A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell. 155:70–80, 2013. doi:10.1016/j.cell.2013.08.030.

    Article  CAS  PubMed  Google Scholar 

  9. Li, L., Ruau, D.J., Patel, C.J., Weber, S.C., Chen, R., Tatonetti, N.P., Dudley, J.T., and Butte, A.J., Disease risk factors identified through shared genetic architecture and electronic medical records. Sci. Transl. Med. 6:234–ra57, 2014. doi:10.1126/scitranslmed.3007191.

    Article  Google Scholar 

  10. Earl, M.F., Information retrieval in biomedicine: natural language processing for knowledge integration. J. Med. Libr. Assoc. 98:190–191, 2010. doi:10.3163/1536-5050.98.2.020.

    Article  PubMed Central  Google Scholar 

  11. Mehrotra, A., Dellon, E.S., Schoen, R.E., Saul, M., Bishehsari, F., Farmer, C., and Harkema, H., Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. Gastrointest. Endosc. 75:1233–9.e14, 2012. doi:10.1016/j.gie.2012.01.045.

    Article  PubMed  Google Scholar 

  12. Uzuner, O., Solti, I., Xia, F., and Cadag, E., Community annotation experiment for ground truth generation for the i2b2 medication challenge. J. Am. Med. Inform. Assoc. 17:519–523. doi:10.1136/jamia.2010.004200.

  13. Gobbel, G.T., Reeves, R., Jayaramaraja, S., Giuse, D., Speroff, T., Brown, S.H., Elkin, P.L., and Matheny, M.E., Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J. Biomed. Inform. 48:54–65, 2014. doi:10.1016/j.jbi.2013.11.008.

    Article  PubMed  Google Scholar 

  14. S. Sohn, Z. Ye, H. Liu, C.G. Chute, I.J. Kullo, Identifying Abdominal Aortic Aneurysm Cases and Controls using Natural Language Processing of Radiology Reports., AMIA Jt. Summits Transl. Sci. Proc. AMIA Summit Transl. Sci. (2013) 249–253. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3845740&tool=pmcentrez&rendertype=abstract (accessed May 12, 2016).

  15. Imler, T.D., Morea, J., Kahi, C., and Imperiale, T.F., Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin. Gastroenterol. Hepatol. 11:689–694, 2013. doi:10.1016/j.cgh.2012.11.035.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Shaban-Nejad, A., Mamiya, H., Riazanov, A., Forster, A.J., Baker, C.J.O., Tamblyn, R., and Buckeridge, D.L., From cues to nudge: a knowledge-based framework for surveillance of healthcare-associated infections. J. Med. Syst. 40:1–12, 2016. doi:10.1007/s10916-015-0364-6.

    Article  Google Scholar 

  17. Chen, L.S., Lin, Z.C., and Chang, J.R., FIR: An Effective Scheme for Extracting Useful Metadata from Social Media. J. Med. Syst. 39, 2015. doi:10.1007/s10916-015-0333-0.

  18. Y.a, W., Y.a, T., L.-L.b, T., Y.-M.b, Q., and J.-S.a, L., An Electronic Medical Record System with Treatment Recommendations Based on Patient Similarity. J. Med. Syst. 39, 2015. doi:10.1007/s10916-015-0237-z.

  19. Sun, W., Rumshisky, A., and Uzuner, O., Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20:806–813. doi:10.1136/amiajnl-2013-001628.

  20. i2b2: Informatics for Integrating Biology & the Bedside, (n.d.). https://www.i2b2.org/NLP/HeartDisease/ (accessed May 12, 2016).

  21. S. Pradhan, N. Elhadad, B.R. South, D. Martinez, Lee, Christensen, A. Vogel, H. Suominen, W.W. Chapman, A.G. Savova, Task 1: ShARe/CLEF eHealth Evaluation Lab, 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-PradhanEt2013.pdf.

  22. D.L. Mowery, S. Velupillai, B.R. South, L. Christensen, D. Martinez, L. Kelly, L. Goeuriot, N. Elhadad, Sameer, Pradhan, G. Savova, and W.W. Chapman, Task 2: ShARe/CLEF eHealth Evaluation Lab, 2014. http://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-MoweryEt2014.pdf.

  23. S. Pradhan, N. Elhadad, W. Chapman, G. Savova, S. Manandhar, Task 7: analysis of clinical text, in: 8th Int. Work. Semant. Eval., 2014.

    Google Scholar 

  24. N. Elhadad, S. Pradhan, S.L. Gorman, W. Manandhar, Suresh Chapman, G. Savova, Task 14: Analysis of Clinical Text, 2015. http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval051.pdf.

  25. van Walraven, C., and Austin, P., Administrative database research has unique characteristics that can risk biased results. J. Clin. Epidemiol. 65:126–131, 2012. doi:10.1016/j.jclinepi.2011.08.002.

    Article  PubMed  Google Scholar 

  26. P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, J. Tsujii, BRAT: a web-based tool for NLP-assisted text annotation, (2012) 102–107. http://dl.acm.org/citation.cfm?id=2380921.2380942 (accessed May 12, 2016).

  27. ABNEY, S., Partial parsing via finite-state cascades. Nat. Lang. Eng. 2:337–344, 1996. doi:10.1017/S1351324997001599.

    Article  Google Scholar 

  28. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., and Buchanan, B.G., A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34:301–310, 2001. doi:10.1006/jbin.2001.1029.

    Article  CAS  PubMed  Google Scholar 

  29. S. Ramanan, S.P. Nathan, Performance and limitations of the linguistically motivated cocoa/Peaberry system in a broad biomedical domain, in: BioNLP Shar. Task, 2013. http://www.aclweb.org/anthology/W13-2011.

  30. S. V Ramanan, S.P. Nathan, Performance of a multi-class biomedical tagger on the BioCreative IV CTD task, in: Fourth BioCreative Chall. Eval. Work., 2013. http://www.biocreative.org/media/store/files/2013/bc4_v1_13.pdf.

  31. S. V Ramanan, S.P. Nathan, RelAgent: Entity Detection and Normalization for Diseases in Clinical Records: a Linguistically Driven Approach, in: 8th Int. Work. Semant. Eval., 2014. http://www.aclweb.org/anthology/S14-2083.

  32. S. V Ramanan, S.P. Nathan, Cocoa: Extending a rule-based system to tag disease attributes in clinical records, in: ShARe/CLEF eHealth Eval. Lab, 2014. http://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-RamananEt2014.pdf.

  33. S. Pradhan, N. Elhadad, W. Chapman, S. Manandhar, G. Savova, SemEval-2014 Task 7: Analysis of Clinical Text, in: Proc. 8th Int. Work. Semant. Eval. (SemEval 2014), 2014: pp. 54–62. http://www.aclweb.org/anthology/S14-2007.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. V. Ramanan or Kedar Radhakrishna.

Ethics declarations

Funding

This work was covered completely by internal funding from St. John’s Research Institute and RelAgent Tech Pvt. Ltd.

Competing Interests

P. Senthil Nathan and S. V. Ramanan are founders of RelAgent Tech Pvt. Ltd., a biomedical text mining company. Other authors declare that they have no competing interests.

Ethics Statement

Ethical approval for the study was granted by the Institutional Ethics Committee (IEC) of St. John’s National Academy of Health Sciences. Patient consent for data collection is obtained as part of routine procedure during admission to the ICU.

Additional information

This article is part of the Topical Collection on Patient Facing Systems

Electronic supplementary material

ESM 1

Supplemmentary material A document containing the corpus annotation guidelines (DOCX 340 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramanan, S.V., Radhakrishna, K., Waghmare, A. et al. Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator. J Med Syst 40, 187 (2016). https://doi.org/10.1007/s10916-016-0541-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-016-0541-2

Keywords

Navigation