European Journal of Epidemiology

, Volume 34, Issue 2, pp 153–162 | Cite as

Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem

  • Qiu-Yue ZhongEmail author
  • Leena P. Mittal
  • Margo D. Nathan
  • Kara M. Brown
  • Deborah Knudson González
  • Tianrun Cai
  • Sean Finan
  • Bizu Gelaye
  • Paul Avillach
  • Jordan W. Smoller
  • Elizabeth W. Karlson
  • Tianxi Cai
  • Michelle A. Williams


We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.


Natural language processing Suicidal behavior Pregnant women Electronic medical Records Classification algorithm 



This research was supported by awards from the National Institutes of Health (the National Institute on Minority Health and Health Disparities: T37-MD001449; and the National Center for Research Resources (NCRR), the National Center for Advancing Translational Sciences (NCATS): 8UL1TR 000170-09). The NIH had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the paper for publication. The authors thank the Enterprise Research Infrastructure & Services at Partners HealthCare for the provision of computing resources. The authors also thank Laurie Bogosian and Stacey Duey of the Research Patient Data Repository at Partners HealthCare for the in-depth support. This research was done as partial fulfillment of the requirements of a Doctor of Science degree by one of the authors (QYZ) in the Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. The authors thank Dr. Michael G. Napolitano for valuable discussions.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

10654_2018_470_MOESM1_ESM.docx (29 kb)
Supplementary material 1 (DOCX 29 kb)


  1. 1.
    Oates M. Suicide: the leading cause of maternal death. Br J Psychiatry. 2003;183:279–81.CrossRefGoogle Scholar
  2. 2.
    Oates M. Perinatal psychiatric disorders: a leading cause of maternal morbidity and mortality. Br Med Bull. 2003;67:219–29.CrossRefGoogle Scholar
  3. 3.
    Lindahl V, Pearson JL, Colpe L. Prevalence of suicidality during pregnancy and the postpartum. Arch Womens Ment Health. 2005;8:77–87.CrossRefGoogle Scholar
  4. 4.
    Zhong Q-Y, Gelaye B, Miller M, Fricchione GL, Cai T, Johnson PA, et al. Suicidal behavior-related hospitalizations among pregnant women in the USA, 2006–2012. Arch Womens Ment Health. 2016;19:463–72.CrossRefGoogle Scholar
  5. 5.
    Thomas KH, Davies N, Metcalfe C, Windmeijer F, Martin RM, Gunnell D. Validation of suicide and self-harm records in the clinical practice research datalink. Br J Clin Pharmacol. 2013;76:145–57.CrossRefGoogle Scholar
  6. 6.
    Lu CY, Stewart C, Ahmed AT, Ahmedani BK, Coleman K, Copeland LA, et al. How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf. 2014;23:218–20.CrossRefGoogle Scholar
  7. 7.
    Anderson HD, Pace WD, Brandt E, Nielsen RD, Allen RR, Libby AM, et al. Monitoring suicidal patients in primary care using electronic health records. J Am Board Fam Med. 2015;28:65–71.CrossRefGoogle Scholar
  8. 8.
    Rhodes AE, Links PS, Streiner DL, Dawe I, Cass D, Janes S. Do hospital E-codes consistently capture suicidal behaviour? Chronic Dis Can. 2002;23:139–45.Google Scholar
  9. 9.
    Walkup JT, Townsend L, Crystal S, Olfson M. A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):174–82.CrossRefGoogle Scholar
  10. 10.
    Zhong Q-Y, Karlson EW, Gelaye B, Finan S, Avillach P, Smoller JW, et al. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018;18:30.CrossRefGoogle Scholar
  11. 11.
    Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306:848–55.Google Scholar
  12. 12.
    Haerian K, Salmasian H, Friedman C. Methods for identifying suicide or suicidal ideation in EHRs. In: AMIA Annual Symposium Proceeding 2012, pp. 1244–53 (2012).Google Scholar
  13. 13.
    Zhong Q-Y, Gelaye B, Smoller JW, Avillach P, Cai T, Williams MA. Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women. PLoS ONE. 2018;13:e0192943.CrossRefGoogle Scholar
  14. 14.
    Wang SV, Rogers JR, Jin Y, Bates DW, Fischer MA. Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention. J Am Med Inform Assoc. 2017;24:339–44.Google Scholar
  15. 15.
    Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH, et al. Predicting Suicidal Behavior From Longitudinal Electronic Health Records. Am J Psychiatry. 2017;174:154–62.CrossRefGoogle Scholar
  16. 16.
    World Health Organization. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2004.Google Scholar
  17. 17.
    Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.CrossRefGoogle Scholar
  18. 18.
    Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–70.CrossRefGoogle Scholar
  19. 19.
    Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.CrossRefGoogle Scholar
  20. 20.
    McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276–82.CrossRefGoogle Scholar
  21. 21.
    Posner K, Oquendo MA, Gould M, Stanley B, Davies M. Columbia classification algorithm of suicide assessment (C-CASA): classification of suicidal events in the FDA’s pediatric suicidal risk analysis of antidepressants. Am J Psychiatry. 2007;164:1035–43.CrossRefGoogle Scholar
  22. 22.
    Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62:1120–7.CrossRefGoogle Scholar
  23. 23.
    Yu S, Chakrabortty A, Liao KP, Cai T, Ananthakrishnan AN, Gainer VS, et al. Surrogate-assisted feature extraction for high-throughput phenotyping. J Am Med Inform Assoc. 2017;24:e143–9.Google Scholar
  24. 24.
    Ananthakrishnan AN, Cai T, Savova G, Cheng S-C, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19:1411–20.CrossRefGoogle Scholar
  25. 25.
    Xia Z, Secor E, Chibnik LB, Bove RM, Cheng S, Chitnis T, et al. Modeling disease severity in multiple sclerosis using electronic health records. PLoS ONE. 2013;8:e78927.CrossRefGoogle Scholar
  26. 26.
    Castro V, Shen Y, Yu S, Finan S, Pau CT, Gainer V, et al. Identification of subjects with polycystic ovary syndrome using electronic health records. Reprod Biol Endocrinol. 2015;13:116.CrossRefGoogle Scholar
  27. 27.
    Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009;37:1733–51.CrossRefGoogle Scholar
  28. 28.
    Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer; 2013.Google Scholar
  29. 29.
    R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing (2014).Google Scholar
  30. 30.
    Cook BL, Progovac AM, Chen P, Mullin B, Hou S, Baca-Garcia E. Novel use of natural language processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in madrid. Comput Math Methods Med. 2016;2016:8708434.CrossRefGoogle Scholar
  31. 31.
    Perlis RH, Iosifescu DV, Castro VM, Murphy SN, Gainer VS, Minnier J, et al. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychol Med. 2012;42:41–50.CrossRefGoogle Scholar
  32. 32.
    Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88:164–8.CrossRefGoogle Scholar
  33. 33.
    Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.CrossRefGoogle Scholar
  34. 34.
    Liao KP, Ananthakrishnan AN, Kumar V, Xia Z, Cagan A, Gainer VS, et al. Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts. PLoS ONE. 2015;10:e0136651.CrossRefGoogle Scholar
  35. 35.
    O’Connor RC, Nock MK. The psychology of suicidal behaviour. Lancet Psychiatry. 2014;1:73–85.CrossRefGoogle Scholar
  36. 36.
    Christensen H, Cuijpers P, Reynolds CF 3rd. Changing the direction of suicide prevention research: a necessity for true population impact. JAMA Psychiatry. 2016;73:435–6.CrossRefGoogle Scholar
  37. 37.
    McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–71.CrossRefGoogle Scholar
  38. 38.
    Gandhi SG, Gilbert WM, McElvy SS, El Kady D, Danielson B, Xing G, et al. Maternal and neonatal outcomes after attempted suicide. Obstet Gynecol. 2006;107:984–90.CrossRefGoogle Scholar
  39. 39.
    Andover MS, Morris BW, Wren A, Bruzzese ME. The co-occurrence of non-suicidal self-injury and attempted suicide among adolescents: distinguishing risk factors and psychosocial correlates. Child Adolesc Psychiatry Ment Health. 2012;6:11.CrossRefGoogle Scholar
  40. 40.
    Nock MK, Joiner TE Jr, Gordon KH, Lloyd-Richardson E, Prinstein MJ. Non-suicidal self-injury among adolescents: diagnostic correlates and relation to suicide attempts. Psychiatry Res. 2006;144:65–72.CrossRefGoogle Scholar
  41. 41.
    Turecki G, Brent DA. Suicide and suicidal behaviour. Lancet. 2016;387:1227–39.CrossRefGoogle Scholar
  42. 42.
    Ribeiro JD, Franklin JC, Fox KR, Bentley KH, Kleiman EM, Chang BP, et al. Letter to the editor: suicide as a complex classification problem: machine learning and related techniques can advance suicide prediction: a reply to Roaldset (2016). Psychol Med. 2016;46:2009–10.CrossRefGoogle Scholar
  43. 43.
    Ressom HW, Varghese RS, Zhang Z, Xuan J, Clarke R. Classification algorithms for phenotype prediction in genomics and proteomics. Front Biosci. 2008;13:691–708.CrossRefGoogle Scholar
  44. 44.
    Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143:187–232.CrossRefGoogle Scholar
  45. 45.
    Nock MK. Suicide: global perspectives from the WHO World Mental Health Surveys. Cambridge: Cambridge University Press; 2012.Google Scholar
  46. 46.
    Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. 2017;5:457–69.CrossRefGoogle Scholar
  47. 47.
    Kemball RS, Gasgarth R, Johnson B, Patil M, Houry D. Unrecognized suicidal ideation in ED patients: are we missing an opportunity? Am J Emerg Med. 2008;26:701–5.CrossRefGoogle Scholar
  48. 48.
    Committee on Obstetric Practice. The American College of Obstetricians and Gynecologists Committee Opinion no. 630. Screening for perinatal depression. Obstet Gynecol. 2015;125:1268–71.CrossRefGoogle Scholar
  49. 49.
    Stewart C, Crawford PM, Simon GE. Changes in coding of suicide attempts or self-harm with transition From ICD-9 to ICD-10. Psychiatr Serv. 2017;68:215.CrossRefGoogle Scholar
  50. 50.
    Oquendo MA, Baca-Garcia E. Suicidal behavior disorder as a diagnostic entity in the DSM-5 classification system: advantages outweigh limitations. World Psychiatry. 2014;13:128–30.CrossRefGoogle Scholar
  51. 51.
    Silverman MM. The language of suicidology. Suicide Life Threat Behav. 2006;36:519–32.CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  • Qiu-Yue Zhong
    • 1
    Email author
  • Leena P. Mittal
    • 2
  • Margo D. Nathan
    • 2
  • Kara M. Brown
    • 2
  • Deborah Knudson González
    • 3
  • Tianrun Cai
    • 4
  • Sean Finan
    • 5
  • Bizu Gelaye
    • 1
  • Paul Avillach
    • 1
    • 5
    • 6
  • Jordan W. Smoller
    • 1
    • 7
  • Elizabeth W. Karlson
    • 4
  • Tianxi Cai
    • 6
    • 8
  • Michelle A. Williams
    • 1
  1. 1.Department of EpidemiologyHarvard T.H. Chan School of Public HealthBostonUSA
  2. 2.Division of Women’s Mental Health, Department of PsychiatryBrigham and Women’s HospitalBostonUSA
  3. 3.Department of Psychiatry and Behavioral Neurosciences, Morsani College of MedicineUniversity of South FloridaTampaUSA
  4. 4.Department of Medicine, Division of Rheumatology, Immunology and AllergyBrigham and Women’s HospitalBostonUSA
  5. 5.Children’s Hospital Informatics ProgramBoston Children’s HospitalBostonUSA
  6. 6.Department of Biomedical InformaticsHarvard Medical SchoolBostonUSA
  7. 7.Psychiatric and Neurodevelopmental Genetics UnitMassachusetts General HospitalBostonUSA
  8. 8.Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonUSA

Personalised recommendations