Skip to main content

Natural Language Processing: Applications in Pediatric Research

  • Chapter
  • First Online:

Part of the book series: Translational Bioinformatics ((TRBIO,volume 2))

Abstract

We discuss specific biomedical Natural Language Processing-based applications that cover a wide spectrum of use cases within the field of translational and health services research. In our use cases we focus on four categories of applications: (1) Information Extraction (IE), (2) Document Classification, (3) Patient Classification, and (4) Sentiment Analysis. We show how the extracted information could be used for (a) Phenotype identification, (b) Comparative effectiveness studies, (c) Cohort identification, (d) Meaningful Use, and (e) Linking patients’ phenotype and genotype. In addition, we discuss the use of Natural Language Processing components for de-identification of large collections of patient notes. We review the literature for examples of pediatric natural language processing applications and show the transferability of select adult clinical natural language processing applications to the pediatric population.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   179.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   229.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Aberdeen J, et al. The MITRE identification scrubber toolkit: design, training, and assessment. Int J Med Inform. 2010;79(12):849–59.

    Article  PubMed  Google Scholar 

  • Ackoff RL. From data to wisdom. J Appl Syst Anal. 1989;16(1):3–9.

    Google Scholar 

  • AMA. Treatment of convulsive status epilepticus. Recommendations of the Epilepsy Foundation of America’s Working Group on Status Epilepticus. JAMA. 1993;270(7):854–9.

    Article  Google Scholar 

  • Arakami E. Automatic deidentification by using sentence features and label consistency. In: Workshop on challenges in natural language I2b2 processing for clinical data, Washington, DC; 2006.

    Google Scholar 

  • Aronson AR, et al. The NLM indexing initiative. Proc AMIA Symp. 2000;2000:17–21.

    Google Scholar 

  • Athenikos SJ, Han H. Biomedical question answering: a survey. Comput Methods Programs Biomed. 2010;99(1):1–24.

    Article  PubMed  Google Scholar 

  • Athenikos SJ, Han H, Brooks AD. A framework of a logic-based question-answering system for the medical domain (LOQAS-Med). In: Proceedings of the 2009 ACM symposium on applied computing. Honolulu: ACM; 2009. p. 847–51.

    Chapter  Google Scholar 

  • Beckwith BA, et al. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006;6:12.

    Article  PubMed  Google Scholar 

  • Benton A, et al. A system for de-identifying medical message board text. BMC Bioinformatics. 2011;12 Suppl 3:S2.

    Article  PubMed  Google Scholar 

  • Berman JJ. Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003;127(6):680–6.

    PubMed  Google Scholar 

  • Brownstein JS, Kleinman KP, Mandl KD. Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system. Am J Epidemiol. 2005;162(7):686–93.

    Article  PubMed  Google Scholar 

  • Cairns BL, et al. The MiPACQ clinical question answering system. AMIA Annu Symp Proc. 2011;2011:171–80.

    PubMed  Google Scholar 

  • cancer Text Information Extraction System (caTIES). [cited 2012 Mar 19]. Available from: https://cabig.nci.nih.gov/community/tools/caties

  • Centers for Medicare and Medicaid Services (CMS). Clinical Quality Measures (CQMs). [cited 2012 Mar 19]. Available from: http://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/ClinicalQualityMeasures.html

  • Chapman WW, et al. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001;2001:105–9.

    Google Scholar 

  • Choi JD, Palmer M. Getting the most out of transition-based dependency parsing. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies. Portland: Association for Computational Linguistics; 2011a. p. 687–92.

    Google Scholar 

  • Choi JD, Palmer M. Transition-based semantic role labeling using predicate argument clustering. In: Proceedings of the ACL 2011 workshop on relational models of semantics. Portland: Association for Computational Linguistics; 2011b. p. 37–45.

    Google Scholar 

  • Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on natural language processing in the biomedical domain, vol. 3. Phildadelphia: Association for Computational Linguistics; 2002. p. 29–36.

    Chapter  Google Scholar 

  • Coursera.org [Standford University]. Natural language processing. [cited 2012 June 1]; Available from: https://class.coursera.org/nlp/auth/welcome

  • Crowley RS, et al. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010;17(3):253–64.

    PubMed  Google Scholar 

  • cTakes (Clinical Text Analysis and Knowledge Extraction System). [cited 2012 June 4]. Available from: http://ohnlp.svn.sourceforge.net/viewvc/ohnlp/trunk/cTAKES/

  • Deleger L, et al. Building gold standard corpora for medical natural language processing tasks. In: American medical informatics annual symposium proceedings. Chicago, 1–6 November 2012.

    Google Scholar 

  • Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.

    Article  Google Scholar 

  • Denny JC, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205–10.

    Article  PubMed  CAS  Google Scholar 

  • Dunlop AL, et al. The impact of HIPAA authorization on willingness to participate in clinical research. Ann Epidemiol. 2007;17(11):899–905.

    Article  PubMed  Google Scholar 

  • eMERGE Network: electronic medical records & genomics. A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. [cited 2012 Mar 19]. Available from: http://gwas.net/

  • Fielstein FJ, Brown SH, Speroff T. Algorithmic De-identification of VA medical exam text for HIPAA privacy compliance: preliminary findings. In: Fiesch M, Coiera E, Li YCJ, editors. MEDINFO 2004: proceedings of the 11th World Congress on Medical Informatics. Fairfax: Ios Press; 2004. p. 1590.

    Google Scholar 

  • Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc. 2008;15(5):601–10.

    Article  PubMed  Google Scholar 

  • Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997;1997:595–9.

    Google Scholar 

  • Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000;2000:270–4.

    Google Scholar 

  • Gardner J, Xiong L. HIDE: an integrated system for health information DE-identification. In: Proceedings of the 21st IEEE international symposium on computer-based medical systems. Los Alamitos: IEEE Computer Society; 2008. p. 254–9.

    Chapter  Google Scholar 

  • Guo Y, et al. Identifying personal health information using support vector machines. In: I2b2 workshop on challenges in natural language processing for clinical data, Washington, DC; 2006.

    Google Scholar 

  • Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004;121(2):176–86.

    Article  PubMed  Google Scholar 

  • Hansen ML, Gunn PW, Kaelber DC. Underdiagnosis of hypertension in children and adolescents. JAMA. 2007;298(8):874–9.

    Article  PubMed  CAS  Google Scholar 

  • Hara K. Applying a SVM based Chunker and a text classifier to the deid challenge. In: I2b2 workshop on challenges in natural language processing for clinical data, Washington, DC; 2006.

    Google Scholar 

  • Haug PJ, et al. Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995;19:284–8.

    Google Scholar 

  • Health Information Technologies Research Laboratory (HITRL). [cited 2012 Mar 19]. Available from: http://hitrl.it.usyd.edu.au/

  • Health information Text Extraction (HITEx). HITEx manual v2.0. [cited 2012 Mar 19]. Available from: https://www.i2b2.org/software/projects/hitex/hitex_manual.html

  • Health Insurance Portability and Accountability Act of 1996 (HIPAA). P.L. 104–191. In: 42 U.S.C. 1996.

    Google Scholar 

  • Hripcsak G, Kuperman GJ, Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med. 1998;37(1):1–7.

    PubMed  CAS  Google Scholar 

  • Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle: ACM; 2004. p. 168–77.

    Google Scholar 

  • IBM – Watson. (n.d.) [cited 2012 April 5]. Available from: http://www-03.ibm.com/innovation/us/watson/index.html

  • Institute of Medicine (IOM). Initial national priorities for comparative effectiveness research [Consensus Report]. 2009 [cited 2012 Mar 19]. Available from: http://www.iom.edu/Reports/2009/ComparativeEffectivenessResearchPriorities.aspx

  • Institute of Medicine (IOM). The learning healthcare system in 2010 and beyond: understanding, engaging, and communicating the possibilities. [Workshop]. 2010 Apr [cited 2012 June 1]; Available from: http://www.iom.edu/Activities/Quality/VSRT/2010-APR-01.aspx

  • Jha AK. The promise of electronic records: around the corner or down the road? JAMA. 2011;306(8):880–1.

    Article  PubMed  CAS  Google Scholar 

  • JULIE Lab. Jena University Language & Information Engineering Lab. [cited 2012 Mar 19]. Available from: http://www.julielab.de/

  • Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall series in artificial intelligence. Upper Saddle River: Prentice Hall; 2000. xxvi, 934 p.

    Google Scholar 

  • Kho AN, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re1.

    Article  PubMed  Google Scholar 

  • Kimia AA, et al. Utility of lumbar puncture for first simple febrile seizure among children 6 to 18 months of age. Pediatrics. 2009;123(1):6–12.

    Article  PubMed  Google Scholar 

  • Kimia A, et al. Yield of lumbar puncture among children who present with their first complex febrile seizure. Pediatrics. 2010;126(1):62–9.

    Article  PubMed  Google Scholar 

  • Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12(6):417–28.

    Article  PubMed  CAS  Google Scholar 

  • Kullo IJ, et al. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010;17(5):568–74.

    Article  PubMed  Google Scholar 

  • Lexical Systems Group. Specialist NLP Tools. [cited 2012 June 1]. Available from: http://lexsrv3.nlm.nih.gov/Specialist/Home/index.html

  • Liao KP, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.

    Article  Google Scholar 

  • Lin C, et al. Feature engineering and selection for rheumatoid arthritis disease activity classification using electronic medical records. In: Proceedings of the 29th international ICML conference Workshop on Machine Learning for Clinical Data; 2012; Edinburgh, Scotland.

    Google Scholar 

  • Lindberg DA, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med. 1993;32(4):281–91.

    PubMed  CAS  Google Scholar 

  • Liu B. Sentiment analysis and opinion mining. In: Paper presented at the twenty-fifth conference on artificial intelligence (AAAI-11 Tutorial); 2011; San Franciso. p. 1–99.

    Google Scholar 

  • Lucene. Apache Lucene Core. [cited 2012 Mar 13]. Available from: http://lucene.apache.org/core/

  • Mack R, et al. Text analytics for life science using the unstructured information management architecture. IBM Syst J. 2004;43(3):490–515.

    Article  Google Scholar 

  • Manning CD, Schütze H. Foundations of statistical natural language processing. 2nd printing, with corrections. ed. Cambridge, MA: MIT Press; 2000. xxxvii, 680 p.

    Google Scholar 

  • McCarty C, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4(1):13.

    Article  PubMed  Google Scholar 

  • Meystre S, Haug PJ. Evaluation of medical problem extraction from electronic clinical documents using MetaMap transfer (MMTx). In: Proceedings of MIE2005 – the XIXth international congress of the European Federation for Medical Informatics. Amsterdam: Ios Press; 2005. p. 823–8.

    Google Scholar 

  • Meystre SM, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008;2008:128–44.

    Google Scholar 

  • Meystre SM, et al. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010;10:70.

    Article  PubMed  Google Scholar 

  • MIST: The MITRE identification scrubber toolkit. [cited 2012 June 4]. Available from: http://mist-deid.sourceforge.net/

  • Murphy SN, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30.

    Article  PubMed  Google Scholar 

  • National Centre for Text Mining (NaCTeM). [cited 2012 Mar 19]. Available from: http://www.nactem.ac.uk/index.php

  • Neamatullah I, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32.

    Article  PubMed  Google Scholar 

  • Nielsen RD, et al. An architecture for complex clinical question answering. In: Proceedings of the 1st ACM international health informatics symposium. Arlington: ACM; 2010. p. 395–9.

    Google Scholar 

  • Online Colleges.net. Standford introducing five free online classes by Anna Schumann. 2012 [cited 2012 June 1]. Available from: http://www.onlinecolleges.net/2012/03/07/stanford-introducing-five-free-online-classes/

  • OpenNLP Tools 1.5.0 API: sentence boundary detector. [cited 2012 June 4]. Available from: http://opennlp.sourceforge.net/api/index.html

  • Palmer M, Gildea D, Kingsbury P. The proposition Bank: an annotated corpus of semantic roles. Comput Linguist. 2005;31(1):71–106.

    Article  Google Scholar 

  • Pestian JP, et al. Using natural language processing to classify suicide notes. AMIA Annu Symp Proc. 2008;2008:1091.

    Google Scholar 

  • Pestian JP, et al. Sentiment analysis of suicide notes: a shared task. Biomed Inform Insights. 2012;5 Suppl 1:3–16.

    Article  PubMed  Google Scholar 

  • Riviello Jr JJ, et al. Practice parameter: diagnostic assessment of the child with status epilepticus (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology. 2006;67(9):1542–50.

    Article  PubMed  Google Scholar 

  • Ruch P, et al. Medical document anonymization with a semantic lexicon. Proc AMIA Symp. 2000;2000:729–33.

    Google Scholar 

  • Savova GK, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010a;17(5):507–13.

    Article  PubMed  Google Scholar 

  • Savova GK, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2010b;2010:722–6.

    PubMed  Google Scholar 

  • Savova GK, et al. Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record. J Am Med Inform Assoc. 2012;19(e1)83–9.

    Article  PubMed  Google Scholar 

  • Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv (CSUR). 2002;34(1):1–47.

    Article  Google Scholar 

  • Singh RK, et al. Prospective study of new-onset seizures presenting as status epilepticus in childhood. Neurology. 2010;74(8):636–42.

    Article  PubMed  CAS  Google Scholar 

  • Sohn S, et al. Classification of medication status change in clinical narratives. AMIA Annu Symp Proc. 2010;2010:762–6.

    PubMed  Google Scholar 

  • Solti I. Increasing clinical trial enrollment: a semi-automated patient centered approach. NIH Project No: 5R00LM010227-04. [cited 2012 June 1]. Available from: http://projectreporter.nih.gov/project_info_description.cfm?aid=8215715&icde=12435657

  • Solti I, et al. Automated classification of radiology reports for acute Lung injury: comparison of keyword and machine learning based natural language processing approaches. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009;2009:314–19.

    Google Scholar 

  • Stein SC, Hurst RW, Sonnad SS. Meta-analysis of cranial CT scans in children. A mathematical model to predict radiation-induced tumors. Pediatr Neurosurg. 2008;44(6):448–57.

    Article  PubMed  Google Scholar 

  • Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007;14(5):574–80.

    Article  PubMed  Google Scholar 

  • Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002;2002:757–61.

    Google Scholar 

  • U.S. Department of Health and Human Services (HHS). Secretary Sebelius announces final rules to support ‘Meaningful Use’ of electronic health records [News Release]. 2010 [cited 2012 Mar 19]. Available from: http://www.hhs.gov/news/press/2010pres/07/20100713a.html

  • U-Compare. [cited 2012 Mar 19]. Available from: http://u-compare.org/index.en.html

  • UIMA (Unstructured Information Management Applications). Apache UIMA. [cited 2012 June 4]. Available from: http://uima.apache.org/

  • Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–63.

    Article  PubMed  Google Scholar 

  • Uzuner O, et al. A de-identifier for medical discharge summaries. Artif Intell Med. 2008;42(1):13–35.

    Article  PubMed  Google Scholar 

  • Warren Z, et al. Therapies for children with autism spectrum disorders. comparative effectiveness review, AHRQ, Number 26. 2011 [cited 2012 June 1]. Available from: http://www.effectivehealthcare.ahrq.gov/ehc/products/106/656/CER26_Autism_Report_04-14-2011.pdf

  • Weber GM, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30.

    Article  PubMed  Google Scholar 

  • Weiming W, et al. Automatic clinical question answering based on UMLS relations. In: Proceedings of the third international conference on semantics, knowledge and grid. Washington, DC: IEEE Computer Society; 2007. p. 495–8.

    Chapter  Google Scholar 

  • Wellner B. Sequence models and ranking methods for discourse parsing. Waltham: Brandeis University; 2009.

    Google Scholar 

  • Wilke RA, et al. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011;89(3):379–86.

    Article  PubMed  CAS  Google Scholar 

  • Wolf MS, Bennett CL. Local perspective of the impact of the HIPAA privacy rule on research. Cancer. 2006;106(2):474–9.

    Article  PubMed  Google Scholar 

  • Yu H, Cao YG. Automatically extracting information needs from Ad Hoc clinical questions. AMIA Annu Symp Proc. 2008;2008:96–100.

    Google Scholar 

  • Zeng QT, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Dr. Savova’s work was supported in part by NIH grants U54LM008748 and 1U01HG006828. Drs. Deleger’s and Solti’s work was supported in part by NIH grant 5R00LM010227.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guergana K. Savova Ph.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Savova, G.K., Deleger, L., Solti, I., Pestian, J., Dexheimer, J.W. (2012). Natural Language Processing: Applications in Pediatric Research. In: Hutton, J. (eds) Pediatric Biomedical Informatics. Translational Bioinformatics, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5149-1_10

Download citation

Publish with us

Policies and ethics