Abstract
We discuss specific biomedical Natural Language Processing-based applications that cover a wide spectrum of use cases within the field of translational and health services research. In our uses cases we focus on four categories of applications: (1) Information Extraction (IE), (2) Document Classification, (3) Patient Classification, and (4) Sentiment Analysis. We show how the extracted information could be used for (a) Phenotype identification, (b) Comparative effectiveness studies, (c) Cohort identification, (d) Meaningful Use, and (e) Linking patients’ phenotype and genotype. In addition, we discuss the use of Natural Language Processing components for de-identification of large collections of patient notes. We review the literature for examples of pediatric natural language processing applications and show the transferability of select adult clinical natural language processing applications to the pediatric population.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aberdeen J, et al. The MITRE identification scrubber toolkit: design, training, and assessment. Int J Med Inform. 2010;79(12):849–59.
Ackoff RL. From data to wisdom. J Appl Syst Anal. 1989;16(1):3–9.
Ananthakrishnan AN, et al. Identification of nonresponse to treatment using narrative data in an electronic health record inflammatory bowel disease cohort. Inflamm Bowel Dis. 2016;22(1):151–8.
Arakami E. Automatic deidentification by using sentence features and label consistency. In: I2b2 workshop on challenges in natural language processing for clinical sata. 2006.
Aronson, AR, et al. The NLM Indexing Initiative. Proc AMIA Symp. 2000. p. 17–21.
Athenikos SJ, Han H. Biomedical question answering: a survey. Comput Methods Programs Biomed. 2010;99(1):1–24.
Athenikos SJ, Han H, Brooks AD. A framework of a logic-based question-answering system for the medical domain (LOQAS-Med). In: Proceedings of the 2009 ACM symposium on applied computing. ACM: Honolulu; 2009. p. 847–51.
Beckwith BA, et al. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006;6:12.
Benton A, et al. A system for de-identifying medical message board text. BMC Bioinf. 2011;12(Suppl 3): S2.
Berman JJ. Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003;127(6):680–6.
Brownstein JS, Kleinman KP, Mandl KD. Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system. Am J Epidemiol. 2005;162(7):686–93.
Cairns BL, et al. The MiPACQ clinical question answering system. AMIA Annu Symp Proc. 2011;2011:171–80.
cancer Text Information Extraction System (caTIES). [cited 2012 March 19]; Available from: https://cabig.nci.nih.gov/community/tools/caties.
cancer.healthnlp.org. Health NLP. [cited 2016 February 18]; Available from: https://healthnlp.hms.harvard.edu/cancer/wiki/index.php/Main_Page.
Castro V, et al. Identification of subjects with polycystic ovary syndrome using electronic health records. Reprod Biol Endocrinol. 2015;13:116.
Centers for Medicare and Medicaid Services (CMS). Clinical Quality Measures (CQMs). [cited 2012 March 19]; Available from: http://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/ClinicalQualityMeasures.html.
Chapman W, et al. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001. p. 105–9.
Choi JD, Palmer M. Getting the most out of transition-based dependency parsing. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. Association for Computational Linguistics: Portland; 2011a. p. 687–92.
Choi JD, Palmer M. Transition-based semantic role labeling using predicate argument clustering. In: Proceedings of the ACL 2011 workshop on relational models of semantics. Association for Computational Linguistics: Portland; 2011b. p. 37–45.
Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on natural language processing in the biomedical domain – volume 3. Phildadelphia: Association for Computational Linguistics; 2002. p. 29–36.
Cohen KB, Fört K, Pestian J. Annotateurs volontaires investis et éthique de l’annotation de lettres de suicidés. In: Proceedings of the TALN 2015 workshop on ethics and natural language processing. ETeRNAL (Ethique et Traitement Automatique des Langues). Caen; 2015.
Cohen KB, et al. Early identication of epilepsy neurosurgery candidates with machine learning and natural language processing [Submitted for publication]. Biomed Inform Insights. 2016.
Coursera.org [Standford University]. Natural Language Processing. [cited 2012 June 1]; Available from: https://class.coursera.org/nlp/auth/welcome.
Crowley RS, et al. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010;17(3):253–64.
cTakes (Clinical Text Analysis and Knowledge Extraction System). [cited 2012 June 4]; Available from: http://ohnlp.svn.sourceforge.net/viewvc/ohnlp/trunk/cTAKES/.
Deleger L, et al. Building gold standard corpora for medical natural language processing tasks. AMIA Annu Symp Proc. 2012;2012:144–53.
Deleger L, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013;20(1):84–94.
Deleger L, et al. Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research. J Biomed Inform. 2014;50:173–83.
Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.
Denny JC, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205–10.
Dunlop AL, et al. The impact of HIPAA authorization on willingness to participate in clinical research. Ann Epidemiol. 2007;17(11):899–905.
eMerge network: electronic medical records and genomics. Publications. 2014 [cited 2016 February 18]; Available from: https://emerge.mc.vanderbilt.edu/publications/.
eMERGE Network: electronic Medical Records & Genomics. A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. [cited 2012 March 19]; Available from: http://gwas.net/.
Fielstein FJ, Brown SH, Speroff T. Algorithmic de-identification of VA medical exam text for HIPAA privacy compliance: preliminary findings. In: Fiesch M, Coiera E, Li YCJ, editors. MEDINFO 2004: proceedings of the 11th world congress on medical informatics. IOS Press: Fairfax; 2004. p. 1590.
Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc. 2008;15(5):601–10.
Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000: p. 270–4.
Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997. p. 595–9.
Gardner J, Xiong L. HIDE: an integrated system for health information DE-identification. In: Proceedings of the 21st ieee international symposium on computer-based medical systems. 2008. p. 254–9.
Guo Y, et al. Identifying personal health information using support vector machines. In: I2b2 workshop on challenges in natural language processing for clinical data. 2006.
Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004;121(2):176–86.
Hansen ML, Gunn PW, Kaelber DC. Underdiagnosis of hypertension in children and adolescents. JAMA. 2007;298(8):874–9.
Hara K. Applying a SVM based Chunker and a text classifier to the deid challenge. In: I2b2 workshop on challenges in natural language processing for clinical data. 2006.
Haug PJ, et al. Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995. p. 284–8.
Health Information Technologies Research Laboratory (HITRL). [cited 2012 March 19]; Available from: http://hitrl.it.usyd.edu.au/.
Health information Text Extraction (HITEx). HITEx Manual v2.0. [cited 2012 March 19]; Available from: https://www.i2b2.org/software/projects/hitex/hitex_manual.html.
Health Insurance Portability and Accountability Act of 1996 (HIPAA). P.L. 104–191, in 42 U.S.C. 1996.
Hripcsak G, Kuperman GJ, Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med. 1998;37(1):1–7.
Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM: Seattle; 2004. p. 168–77.
IBM. IBM – Watson. [cited 2012 April 5]; n.d. Available from: http://www-03.ibm.com/innovation/us/watson/index.html.
Institute of Medicine (IOM). Initial National Priorities for Compartive Effectiveness Research [Consensus Report]. 2009 [cited 2012 March 19]; Available from: http://www.iom.edu/Reports/2009/ComparativeEffectivenessResearchPriorities.aspx.
Institute of Medicine (IOM). The learning healthcare system in 2010 and beyond: understanding, engaging, and communicating the possibilities. [Workshop]. 2010 [cited 2012 June 1]; Available from: http://www.iom.edu/Activities/Quality/VSRT/2010-APR-01.aspx.
Jha AK. The promise of electronic records: around the corner or down the road? JAMA. 2011;306(8):880–1.
JULIE Lab. Jena University Language & Information Engineering Lab. [cited 2012 March 19]; Available from: http://www.julielab.de/.
Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall series in artificial intelligence. Upper Saddle River: Prentice Hall; 2000. p. xxvi, 934 p.
Kho AN, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re1.
Kimia AA, et al. Utility of lumbar puncture for first simple febrile seizure among children 6 to 18 months of age. Pediatrics. 2009;123(1):6–12.
Kimia A, et al. Yield of lumbar puncture among children who present with their first complex febrile seizure. Pediatrics. 2010;126(1):62–9.
Kirby J, et al. An online repository for electronic mdical record phenotype algorithm development and sharing [in press]. J Am Med Inform Assoc. 2016.
Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12(6):417–28.
Kullo IJ, et al. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010;17(5):568–74.
Lexical Systems Group. Specialist NLP Tools. [cited 2012 June 1]; Available from: http://lexsrv3.nlm.nih.gov/Specialist/Home/index.html.
Liao KP, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.
Liao KP, et al. Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts. PLoS One. 2015;10(8):e0136651.
Lin C, et al. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013;8(8):e69932.
Lin C, Karlson EW, Dligach D, Ramirez MP, Miller TA, Mo H, et al. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc. 2015 Apr;22(e1):e151–61. doi:10.1136/amiajnl-2014-002642. Epub 2014 Oct 25.
Lin C, Dligach D, Miller TA, Bethard S, Savova GK. Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2016 Mar;23(2):387–95. doi:10.1093/jamia/ocv113. Epub 2015 Oct 31.
Lindberg DA, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med. 1993;32(4):281–91.
Liu B. Sentiment analysis and opinion mining. In: Paper presented at the twenty-fifth conference on artificial intelligence (AAAI-11 tutorial). San Franciso; 2011. p. 1–99.
Lucene. Apache Lucene Core. [cited 2012 March 13]; Available from: http://lucene.apache.org/core/.
Mack R, et al. Text analytics for life science using the unstructured information management architecture. IBM Syst J. 2004;43(3):490–515.
Manning CD, Schütze H. Foundations of statistical natural language processing. 2nd printing, with corrections. ed. Cambridge, MA.: MIT Press; 2000. xxxvii, 680 p.
McCarty C, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genet. 2011;4(1):13.
Meystre S, Haug PJ. Evaluation of medical problem extraction from electronic clinical documents using MetaMap transfer (MMTx). In: Proceedings of MIE2005 – the XIXth international congress of the European federation for medical informatics. IOS Press; 2005. p. 823–8.
Meystre SM, et al. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010;10:70.
Meystre SM, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008. p. 128–44.
Mo H, et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc. 2015;22(6):1220–30.
Murphy SN, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30.
National Centre for Text Mining (NaCTeM). [cited 2012 March 19]; Available from: http://www.nactem.ac.uk/index.php.
Neamatullah I, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32.
Ni Y, Kennebeck S, Dexheimer JW, McAneney CM, Tang H, Lingren T, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. 2015a Jan;22(1):166–78. doi:10.1136/amiajnl-2014-002887. Epub 2014 Jul 16.
Ni Y, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis Mak. 2015b;15(1):28.
Nielsen RD, et al. An architecture for complex clinical question answering. In: Proceedings of the 1st ACM international health informatics symposium. ACM: Arlington; 2010. p. 395–9.
Online Colleges.net. Standford introducing five free online classes by Anna Schumann. 2012 [cited 2012 June 1]; Available from: http://www.onlinecolleges.net/2012/03/07/stanford-introducing-five-free-online-classes/.
OpenNLP Tools 1.5.0 API: Sentence Boundary Detector. [cited 2012 June 4]; Available from: http://opennlp.sourceforge.net/api/index.html.
Palmer M, Gildea D, Kingsbury P, The Proposition Bank. An annotated corpus of semantic roles. Comput Linguist. 2005;31(1):71–106.
Pestian JP, et al. Sentiment analysis of suicide notes: a shared task. Biomed Inform Insights. 2012;5 Suppl 1:3–16.
Pestian JP, et al. Machine learning approach to identifying the thought markers of suicidal subjects: a prospective multicenter trial [in press]. Suicide Life Threat Behav. 2016.
Pestian JP, et al. Using natural language processing to classify suicide notes. AMIA Annu Symp Proc. 2008. p. 1091.
Riviello Jr JJ, et al. Practice parameter: diagnostic assessment of the child with status epilepticus (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology. 2006;67(9):1542–50.
Ruch P, et al. Medical document anonymization with a semantic lexicon. Proc AMIA Symp. 2000. p. 729–33.
Savova GK, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010a;17(5):507–13.
Savova GK, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2010b;2010:722–6.
Savova GK, et al. Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record. J Am Med Inform Assoc. 2012 Jun;19(e1):e83–9. Epub 2011 Dec 1.
Scherer S, et al. Reduced vowel space is a robust indicator of psychological distress: a cross-corpus analysis. In: Acoustics, speech and signal processing (ICASSP), 2015 IEEE international conference. 2015; p. 4789–93.
Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv (CSUR). 2002;34(1):1–47.
Singh RK, et al. Prospective study of new-onset seizures presenting as status epilepticus in childhood. Neurology. 2010;74(8):636–42.
Sohn S, et al. Classification of medication status change in clinical narratives. AMIA Annu Symp Proc. 2010;2010:762–6.
Solti I, et al. Automated classification of radiology reports for acute lung injury: comparison of keyword and machine learning based natural language processing approaches. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009;2009: 314–9.
Standridge S, et al. The reliability of an epilepsy treatment clinical decision support system. J Med Syst. 2014;38(10):119.
Stein SC, Hurst RW, Sonnad SS. Meta-analysis of cranial CT scans in children. A mathematical model to predict radiation-induced tumors. Pediatr Neurosurg. 2008;44(6):448–57.
Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007;14(5):574–80.
Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002. p. 757–61.
Treatment of convulsive status epilepticus. Recommendations of the Epilepsy Foundation of America’s Working Group on Status Epilepticus. JAMA. 1993;270(7):854–9.
Tseytlin E, et al. NOBLE – flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinf. 2016;17(1):32.
U.S. Department of Health and Human Services (HHS). Secretary sebelius announces final rules to support ‘meaningful use’ of electronic health records [News Release]. 2010 [cited 2012 March 19]; Available from: http://www.hhs.gov/news/press/2010pres/07/20100713a.html.
U-Compare. [cited 2012 March 19]; Available from: http://u-compare.org/index.en.html.
UIMA (Unstructured Information Management Applications). Apache UIMA. [cited 2012 June 4]; Available from: http://uima.apache.org/.
Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–63.
Uzuner O, et al. A de-identifier for medical discharge summaries. Artif Intell Med. 2008;42(1):13–35.
Venek V, et al. Adolescent suicidal risk assessment in clinician-patient interaction: a study of verbal and acoustic behaviors. In: Spoken Language Technology Workshop (SLT), 2014 IEEE. 2014.
Warren Z, et al. Therapies for children with autism spectrum disorders. Comparative effectiveness review, AHRQ, Number 26. 2011 [cited 2012 June 1]; Available from: http://www.effectivehealthcare.ahrq.gov/ehc/products/106/656/CER26_Autism_Report_04-14-2011.pdf.
Weber GM, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30.
Weiming W, et al. Automatic clinical question answering based on UMLS relations. In: Proceedings of the third international conference on semantics, knowledge and crid. Shan Xi: IEEE Computer Society; 2007. p. 495–8.
Wellner B. Sequence models and ranking methods for discourse parsing. Waltham: Brandeis University; 2009.
Wilke RA, et al. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011;89(3):379–86.
Wolf MS, Bennett CL. Local perspective of the impact of the HIPAA privacy rule on research. Cancer. 2006;106(2):474–9.
Wu S, et al. Negation’s not solved: generalizability versus optimizability in clinical natural language processing. PLoS One. 2014;9(11):e112774.
Xia Z, et al. Modeling disease severity in multiple sclerosis using electronic health records. PLoS One. 2013;8(11):e78927.
Yu H, Cao YG. Automatically extracting information needs from Ad Hoc clinical questions. AMIA Annu Symp Proc. 2008. p. 96–100.
Zeng QT, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Savova, G., Pestian, J., Connolly, B., Miller, T., Ni, Y., Dexheimer, J.W. (2016). Natural Language Processing: Applications in Pediatric Research. In: Hutton, J. (eds) Pediatric Biomedical Informatics. Translational Bioinformatics, vol 10. Springer, Singapore. https://doi.org/10.1007/978-981-10-1104-7_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-1104-7_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1102-3
Online ISBN: 978-981-10-1104-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)