Skip to main content

Natural Language Processing of Medical Reports

  • Chapter
  • First Online:
Medical Imaging Informatics
  • 2232 Accesses

Abstract

A significant amount of information regarding the observations, assessments, and recommendations related to a patient's case is documented within free-text medical reports. The ability to structure and standardize clinical patient data has been a grand goal of medical informatics since the inception of the field - especially if this structuring can be (automatically) achieved at the patient bedside and within the modus operandi of current medical practice. A computational infrastructure that transforms the process of clinical data collection from an uncontrolled to highly controlled operation (i.e., precise, completely specified, standard representation) can facilitate medical knowledge acquisition and its application to improve healthcare. Medical natural language processing (NLP) systems attempt to interpret free-text to facilitate a clinical, research, or teaching task. An NLP system performs translates a source language (e.g., free-text) to a target surrogate, computer-understandable representation (e.g., first-order logic), which in turn can support the operations of a driving application. NLP is really then a transformation from a representational form that is not very useful from the perspective of a computer (a sequence of characters) to a form that is useful (a logic-based representation of the text meaning). In general, the accuracy and speed of translation is heavily dependent on the end application. This chapter presents work related to natural language processing of clinical reports, covering issues related to representation, computation, and evaluation. We first summarize a number of typical clinical applications. We then present a high-level formalization of the medical NLP problem in order to provide structure as to how various aspects of NLP fit and complement one another. Examples of approaches that target various forms of representations and degrees of potential accuracy are discussed. Individual NLP subtasks are subsequently discussed. We conclude this chapter with evaluation methods and a discussion of the directions expected in the processing of clinical medical reports. Throughout, we describe applications illustrating the many open issues revolving around medical natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abney S (2002) Bootstrapping. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 360-367.

    Google Scholar 

  2. Aho AV, Corasick MJ (1975) Efficient string matching: Aid to bibliographic search. Comm ACM, 18(6):333-340.

    Article  MATH  MathSciNet  Google Scholar 

  3. Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp, pp 17-21.

    Google Scholar 

  4. Baneyx A, Charlet J, Jaulent MC (2006) Methodology to build medical ontology from textual resources. Proc AMIA Symp, pp 21-25.

    Google Scholar 

  5. Bashyam V (2008) Towards a canonical representation for machine understanding of natural language in radiology reports. Department of Information Studies, PhD dissertation. University of California Los Angeles.

    Google Scholar 

  6. Bashyam V, Taira RK (2005) Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA. Proc AMIA Symp pp 26-30.

    Google Scholar 

  7. Bashyam V, Taira RK (2005) A study of lexical behaviour of sentences in chest radiology reports. Proc AMIA Symp p 891.

    Google Scholar 

  8. Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G (2003) Detecting adverse events using information technology. J Am Med Inform Assoc, 10(2):115-128.

    Article  Google Scholar 

  9. Baud R (2004) A natural language based search engine for ICD10 diagnosis encoding. Med Arh, 58(1 Suppl 2):79-80.

    Google Scholar 

  10. Becker GJ (2005) Restructuring cancer clinical trials. J Am Coll Radiol, 2(10):816-817.

    Article  Google Scholar 

  11. Bell GB, Sethi A (2001) Matching records in a national medical patient index. Communications of the ACM, 44(9):83-88.

    Article  Google Scholar 

  12. Berger AL, DellaPietra SA, DellaPietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71.

    Google Scholar 

  13. Berman JJ (2004) Pathology abbreviated: A long review of short terms. Arch Pathol Lab Med, 128(3):347-352.

    Google Scholar 

  14. Berrios DC (2000) Automated indexing for full text information retrieval. Proc AMIA Symp, pp 71-75.

    Google Scholar 

  15. Black A, van de Plassche J, Williams B (1991) Analysis of unknown words through morphological decomposition. Proc 5th Conf European Chapter of the Association of Computational Linguistics, pp 101-106.

    Google Scholar 

  16. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proc 11th Annual Conf Computational Learning Theory, pp 92-100.

    Google Scholar 

  17. Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform, 36(6):414-432.

    Article  Google Scholar 

  18. Booker DL, Berman JJ (2004) Dangerous abbreviations. Hum Pathol, 35(5):529-531.

    Article  Google Scholar 

  19. Bouillon P, Rayner M, Chatzichrisafis N, Hockey BA, Santaholma M, Starlander M, Nakao Y, Kanzaki K, Isahara H (2005) A generic multi-lingual open source platform for limited-domain medical speech translation. Proc 10th Annual Conf European Association for Machine Translation, pp 50-58.

    Google Scholar 

  20. Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543-565.

    Google Scholar 

  21. Budiu R, Anderson JR (2004) Interpretation-based processing: A unified theory of semantic sentence comprehension. Cognitive Science, 28(1):1-44.

    Article  Google Scholar 

  22. Campbell DA, Johnson SB (2001) Comparing syntactic complexity in medical and non-medical corpora. Proc AMIA Symp, pp 90-94.

    Google Scholar 

  23. Campbell DA, Johnson SB (2002) A transformational-based learner for dependency grammars in discharge summaries. Proc ACL-02 Workshop on Natural language Processing in the Biomedical Domain, vol 3, pp 37-44.

    Google Scholar 

  24. Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak G (2005) Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. Proc AMIA Symp, pp 106-110.

    Google Scholar 

  25. Cardie C (1994) Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis. Department of Computer Science PhD dissertation. University of Massachusetts, Amherst.

    Google Scholar 

  26. Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J (1999) Simplifying text for language-impaired readers. Proc 9th Conf European Chapter of the Association of Computational Linguistics, pp 269-270.

    Google Scholar 

  27. Carter PI (2004) HIPAA Compliance Handbook 2004. Aspen Publishing, Gaithersburg, MD.

    Google Scholar 

  28. Chao G (2002) Recurrent probabilistic modeling and its application to part-of-speech tagging. Proc 40th Annual Meeting Assoc Computational Linguistics: Student Research Workshop, pp 6-11.

    Google Scholar 

  29. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform, 34(5):301-310.

    Article  Google Scholar 

  30. Chapman WW, Chu D, Dowling JN (2007) ConText: An algorithm for identifying contextual features from clinical text. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 81-88.

    Google Scholar 

  31. Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC (2004) Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Stud Health Technol Inform, 107(Pt 1):487-491.

    Google Scholar 

  32. Charniak E (2001) Unsupervised learning of name structure from coreference data. Proc North American Chapter Assoc Computational Linguistics, pp 48-54.

    Google Scholar 

  33. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4):359-394.

    Article  Google Scholar 

  34. Chinchor N, Marsh E (1998) MUC-7 named entity task definition. Proc 7th Message Understanding Conference (MUC-7).

    Google Scholar 

  35. Cho PS, Taira RK, Kangarloo H (2002) Text boundary detection of medical reports. Proc AMIA Symp, pp 155-159.

    Google Scholar 

  36. Cho PS, Taira RK, Kangarloo H (2003) Automatic section segmentation of medical reports. Proc AMIA Symp, pp 155-159.

    Google Scholar 

  37. Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: A probabilistic medical language understanding system. Proc ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, vol 3, pp 29-36.

    Google Scholar 

  38. Ciaramita M, Johnson M (2000) Explaining away ambiguity: Learning verb selectional preference with Bayesian networks. Proc 18th Conf Computational Linguistics, vol 1, pp 187-193.

    Google Scholar 

  39. Clegg AB, Shepherd AJ (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics, 8:24-40.

    Article  Google Scholar 

  40. Coates-Stephens S (1992) The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities, 26(5):441-456.

    Article  Google Scholar 

  41. Coden AR, Pakhomov SV, Ando RK, Duffy PH, Chute CG (2005) Domain-specific language models and lexicons for tagging. J Biomed Inform, 38(6):422-430.

    Article  Google Scholar 

  42. Cohen KB, Hunter L (2006) A critical review of PASBio's argument structures for biomedical verbs. BMC Bioinformatics, 7 Suppl 3:S5.

    Article  Google Scholar 

  43. Cohn A (1996) Calculi for qualitative spatial reasoning. Artificial Intelligence and Symbolic Mathematical Computation, pp 124-143.

    Google Scholar 

  44. Collins M (2002) Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 489-496.

    Google Scholar 

  45. Computational Mdeicine Center (2009) International Challenge: Classifying Clinical Free Text Using Natural Language Processing. http://www.computationalmedicine.org-/challenge . Accessed April 14, 2009.

  46. D'Avolio LW, Litwin MS, Rogers SO, Jr., Bui AA (2008) Facilitating clinical outcomes assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc, 15(3):341-348.

    Article  Google Scholar 

  47. Dejean H (2000) ALLiS: A symbolic learning system for natural language learning. Proc CoNLL-2000 and LLL-2000, pp 95-98.

    Google Scholar 

  48. DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31-39.

    Google Scholar 

  49. Divita G, Browne AC, Rindflesch TC (1998) Evaluating lexical variant generation to improve information retrieval. Proc AMIA Symp, pp 775-779.

    Google Scholar 

  50. Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, Shabo Shvo A (2006) HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc, 13(1):30-39.

    Article  Google Scholar 

  51. Duda RO, Hart PE, Stork DG (2001) Pattern Classification. 2nd edition. Wiley, New York, NY.

    MATH  Google Scholar 

  52. Eck M, Vogel S, Waibel A (2004) Improving statistical machine translation in the medical domain using the unified medical language system. Proc 20th Intl Conf Computational Linguistics.

    Google Scholar 

  53. Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol, 22(10):1315-1316.

    Article  Google Scholar 

  54. Eng J, Eisner JM (2004) Radiology report entry with automatic phrase completion driven by language modeling. RadioGraphics, 24(5):1493-1501.

    Article  Google Scholar 

  55. Fellbaum C (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.

    Google Scholar 

  56. Feng D, Burns G, Zhu J, Hovy EH (2008) Towards automated semantic analysis on biomedical research articles. Proc 3rd Intl Joint Conf Natural Language Processing.

    Google Scholar 

  57. Firth JR (1957) Modes of meaning. In: Firth JR (ed) Papers in Linguistics 1934-1951. Oxford University Press, London.

    Google Scholar 

  58. Fisk JM, Mutalik P, Levin FW, Erdos J, Taylor C, Nadkarni P (2003) Integrating query of relational and textual data in clinical databases: A case study. J Am Med Inform Assoc, 10(1):21-38.

    Article  Google Scholar 

  59. Forney Jr GD (1973) The Viterbi algorithm. Proceedings of the IEEE, 61(3):268-278.

    Article  MathSciNet  Google Scholar 

  60. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc, 1(2):161-174.

    Google Scholar 

  61. Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proc AMIA Symp:855-859.

    Google Scholar 

  62. Friedman C, Huff SM, Hersh WR, Pattisongordon E, Cimino JJ (1995) The Canon Group's effort: Working toward a merged model. J Am Med Inform Assoc, 2(1):4-18.

    Google Scholar 

  63. Friedman C, Kra P, Rzhetsky A (2002) Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform, 35(4):222-235.

    Article  Google Scholar 

  64. Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc, 11(5):392-402.

    Article  Google Scholar 

  65. Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proc 17th Intl Conf Machine Learning (ICML-2000), pp 327-334.

    Google Scholar 

  66. Guihenneuc-Jouyaux C, Richardson S, Longini IM, Jr. (2000) Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline. Biometrics, 56(3):733-741.

    Article  MATH  Google Scholar 

  67. Gundlapalli AV, South BR, Phansalkar S, Kinney AY, Shen S (2008) Application of natural language processing to VA electronic health records to identify phenotypic characteristics for clinical and research purposes. Proc 2008 AMIA Summit on Translational Bioinformatics, pp 36-40.

    Google Scholar 

  68. Gupta A, Ludascher B, Grethe JS, Martone ME (2003) Towards a formalization of disease-specific ontologies for neuroinformatics. Neural Networks 16:1277-1292.

    Article  Google Scholar 

  69. Gupta D, Saul M, Gilbertson J (2004) Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol, 121(2):176-186.

    Article  Google Scholar 

  70. Hachey B, Alex B, Mecker M (2005) Investigating the effects of selective sampling on the annotation task. Proc 9th Conf Computational Natural Language Processing, pp 144-151.

    Google Scholar 

  71. Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Symp, pp 814-818.

    Google Scholar 

  72. Heinze DT, Morsch ML, Sheffer RE, Jimmink MA, Jennings MA, Morris WC, Morch AEW (2001) LifeCode: A deployed application for automated medical coding. AI Magazine, 22(2):76-88.

    Google Scholar 

  73. Hersh WR, Campbell EM, Malveau SE (1997) Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: A lexical analysis. Proc AMIA Fall Symp, pp 580-584.

    Google Scholar 

  74. Herzig TW, Johns M (1997) Extraction of medical information from textual sources: A statistical variant of the boundary-word method. J Am Med Inform Assoc:859-859.

    Google Scholar 

  75. Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology, 224(1):157-163.

    Article  Google Scholar 

  76. Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc, 14(3):304-311.

    Article  Google Scholar 

  77. Huang Y, Lowe HJ, Klein D, Cucina RJ (2005) Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc, 12(3):275-285.

    Article  Google Scholar 

  78. Huddleston R (1984) Introduction to the Grammar of English. Cambridge University Press, Cambridge, MA.

    Google Scholar 

  79. Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, Rindflesch TC (2006) Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. J Am American Society for Information Science and Technology, 57(1):96-113.

    Article  Google Scholar 

  80. Iwanska LM, Shapiro SC (2000) Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language. AAAI Press, Menlo Park, CA.

    MATH  Google Scholar 

  81. Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: A review. IEEE Trans Pattern Analysis and Machine Intelligence, 22(1):4-37.

    Article  Google Scholar 

  82. Jelinek F (1999) Statistical Methods for Speech Recognition. 2nd edition. MIT press, Cambridge, MA.

    Google Scholar 

  83. Johansson C (2000) A context sensitive maximum likelihood approach to chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 136-138.

    Google Scholar 

  84. Johnson DB, Chu WW, Dionisio JD, Taira RK, Kangarloo H (1999) Creating and indexing teaching files from free-text patient reports. Proc AMIA Symp, pp 814-818.

    Google Scholar 

  85. Johnson SB (1998) Conceptual graph grammar: A simple formalism for sublanguage. Methods Inf Med, 37(4-5):345-352.

    Google Scholar 

  86. Johnson SB (1999) A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6(3):205-218.

    Google Scholar 

  87. Joshi M, Pedersen MJT, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. Proc 21st National Conf Artificial Intelligence.

    Google Scholar 

  88. Jurafsky D, Martin JH (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.

    Google Scholar 

  89. Karlsson F (1990) Constraint grammar as a framework for parsing running text. Proc 13th Annual Conf Computational Linguistics, pp 168-173.

    Google Scholar 

  90. Kudo T, Matsumoto Y (2001) Chunking with support vector machines. Proc 2nd Meeting North American Chapter Assoc Computational Linguistics on Language Technologies, pp 192-199.

    Google Scholar 

  91. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc 18th Intl Conf Machine Learning, pp 282-289.

    Google Scholar 

  92. Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC (2002) Terminology extraction from text to build an ontology in surgical intensive care. Proc AMIA Symp, pp 430-434.

    Google Scholar 

  93. Lee DL, Chuang H, Seamons K (1997) Document ranking and the vector-space model. IEEE Software, 14(2):67-75.

    Article  Google Scholar 

  94. Li L, Chase HS, Patel CO, Friedman C, Weng C (2008) Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: A case study. Proc AMIA Symp, pp 404-408.

    Google Scholar 

  95. Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med, 32(4):281-291.

    Google Scholar 

  96. Liu K, Chapman W, Hwa R, Crowley RS (2007) Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. J Am Med Inform Assoc, 14(5):641-650.

    Article  Google Scholar 

  97. Lovis C, Michel PA, Baud R, Scherrer JR (1995) Word segmentation processing: A way to exponentially extend medical dictionaries. Proc MedInfo, vol 8 Pt 1, pp 28-32.

    Google Scholar 

  98. Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR (1991) The application of natural-language processing to healthcare quality assessment. Med Decis Making, 11(4 Suppl):S65-68.

    Google Scholar 

  99. Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

    MATH  Google Scholar 

  100. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.

    Google Scholar 

  101. McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Proc 7th Intl Conf Machine Learning, pp 591-598.

    Google Scholar 

  102. McCray AT, Bodenreider O, Malley JD, Browne AC (2001) Evaluating UMLS strings for natural language processing. Proc AMIA Symp, pp 448-452.

    Google Scholar 

  103. McDonald DD (1993) Internal and external evidence in the identification and semantic categorization of proper names. Acquisition of Lexical Knowledge from Text: Proc Workshop Sponsored by the Special Interest Group on the Lexicon of the ACL, pp 32-43.

    Google Scholar 

  104. McDonald DD (1996) Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev B, Pustejovsky J (eds) Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA, pp 21-39.

    Google Scholar 

  105. McRoy SW, Ali SS, Haller SM (1997) Uniform knowledge representation for language processing in the B2 system. Natural Language Engineering, 3(2):123-145.

    Article  Google Scholar 

  106. Melton GB, Hripcsak G (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc, 12(4):448-457.

    Article  Google Scholar 

  107. Meng H, Lam W, Low KF (1999) Learning belief networks for language understanding. Proc ASRU.

    Google Scholar 

  108. Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak, 5:30.

    Article  Google Scholar 

  109. Meystre S, Haug PJ (2006) Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform, 39(6):589-599.

    Article  Google Scholar 

  110. Mikheev A (2000) Tagging sentence boundaries. Proc 1st North American Chapter Assoc Computational Linguistics Conf, pp 264-271.

    Google Scholar 

  111. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: An on-line lexical database. Intl J Lexicography, 3(4):235-244.

    Article  Google Scholar 

  112. Miller JE, Torii M, Vijay-Shanker K (2007) Adaptation of POS tagging for multiple biomedical domains. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 179-180.

    Book  Google Scholar 

  113. Minsky ML, Papert S (1988) Perceptrons: An Introduction to Computational Geometry. Expanded edition. MIT Press, Cambridge, MA.

    Google Scholar 

  114. Molina A, Pla F (2002) Shallow parsing using specialized HMMs. J Machine Learning Research, 2(4):595-613.

    Article  MATH  Google Scholar 

  115. Nadkarni P, Chen R, Brandt C (2001) UMLS concept indexing for production databases: A feasibility study. J Am Med Inform Assoc, 8(1):80-91.

    Google Scholar 

  116. Navigli R (2009) Word sense disambiguation: A survey. ACM Computing Surveys, 41(2):1-69.

    Article  Google Scholar 

  117. Neamatullah I, Douglass MM, Lehman LWH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD (2008) Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(32):1-17.

    Google Scholar 

  118. Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD (1995) Identifying concepts in medical knowledge. Proc MedInfo, vol 8, pp 33-36.

    Google Scholar 

  119. Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc 24th Intl Conf Machine Learning, pp 681-688.

    Google Scholar 

  120. Pakhomov S, Pedersen T, Chute CG (2005) Abbreviation and acronym disambiguation in clinical discourse. Proc AMIA Symp, pp 589-593.

    Google Scholar 

  121. Pedersen MJT, Banerjee S, Patwardhan S (2005) Maximizing semantic relatedness to perform word sense disambiguation (Technical Report). University of Minnesota Supercomputing Institute.

    Google Scholar 

  122. Penz JF, Wilcox AB, Hurdle JF (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform, 40(2):174-182.

    Article  Google Scholar 

  123. Pestian JP, Itert L, Duch W (2004) Development of a pediatric text-corpus for part-of-speech tagging. In: Wierzchon ST, Trojanowski K (eds) Intelligent Information Processing and the Web. Springer, pp 219-226.

    Google Scholar 

  124. Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large datasets. Proc 2001 Conf Empirical Methods in Natural Language Processing, pp 1–9.

    Google Scholar 

  125. Polackova G (2008) Understanding and use of phrasal verbs and idioms in medical/nursing texts. Bratisl Lek Listy, 109(11):531-532.

    Google Scholar 

  126. Pyper C, Amery J, Watson M, Crook C (2004) Patients' experiences when accessing their on-line electronic patient records in primary care. British Journal of General Practice, 54(498):38-43.

    Google Scholar 

  127. Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  128. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.

    Article  Google Scholar 

  129. Radiological Society of North America (2009) RadLex: A Lexicon for Uniform Indexing and Retrieval of Radiology Information Resources. http://www.rsna.org/radlex/ . Accessed April 14, 2009.

  130. Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. Proc Conf Empirical Methods in Natural Language Processing, pp 133-142.

    Google Scholar 

  131. Ratnaparkhi A (1998) Maximum Entropy Models for Natural Language Ambiguity Resolution. Department of Computer and Information Science PhD dissertation. University of Pennsylvania.

    Google Scholar 

  132. Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO (1997) Maintaining the confidentiality of medical records shared over the Internet and the World Wide Web. Ann Intern Med, 127(2):138-141.

    Google Scholar 

  133. Roth D (1999) Memory based learning (Technical Report UIUCDCS-R-99-2125). Department of Computer Science, University of Illinois at Urbana-Champaign.

    Google Scholar 

  134. Ruch P, Baud R, Geissbuhler A (2003) Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1-2):169-184.

    Article  Google Scholar 

  135. Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G (2000) Medical document anonymization with a semantic lexicon. Proc AMIA Symp, pp 729-733.

    Google Scholar 

  136. Ruppenhofer J, Ellsworth M, Petruck M, Johnson C (2005) FrameNet II: Extended Theory and Practice (Technical Report). ICSI, Berkeley, CA.

    Google Scholar 

  137. Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: Applications to patient data representation and automatic encoding. Methods Inf Med, 34(1-2):140-146.

    Google Scholar 

  138. Salton G (1988) Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.

    Google Scholar 

  139. Sang EFTK, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: Chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 127-132.

    Google Scholar 

  140. Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC, Chute CG (2008) Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform, 41(6):1088-1100.

    Article  Google Scholar 

  141. Schulz S, Hahn U (2000) Morpheme-based, cross-lingual indexing for medical document retrieval. Int J Med Inform, 58-59:87-99.

    Article  Google Scholar 

  142. Schulz S, Honeck M, Hahn U (2002) Biomedical text retrieval in languages with a complex morphology. Proc Workshop on NLP in the Biomedical Domain, pp 61-68.

    Google Scholar 

  143. Skut W, Brants T (1998) Chunk tagger: Statistical recognition of noun phrases. Proc ESSLLI-1998 Workshop on Automated Acquisition of Syntax and Parsing.

    Google Scholar 

  144. Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C (2005) Relations in biomedical ontologies. Genome Biol, 6(5):R46.

    Article  Google Scholar 

  145. Smith L, Rindflesch T, Wilbur WJ (2004) MedPost: A part-of-speech tagger for biomedical text. Bioinformatics, 20(14):2320-2321.

    Article  Google Scholar 

  146. Strzalkowski T (1999) Natural Language Information Retrieval. Kluwer Academic, Boston, MA.

    Google Scholar 

  147. Sweeney L (1996) Replacing personally-identifying information in medical records: The Scrub system. Proc AMIA Symp, pp 333-337.

    Google Scholar 

  148. Taira R, Bui AA, Hsu W, Bashyam V, Dube S, Watt E, Andrada L, El-Saden S, Cloughesy T, Kangarloo H (2008) A tool for improving the longitudinal imaging characterization for neuro-oncology cases. Proc AMIA Symp, pp 712-716.

    Google Scholar 

  149. Taira RK, Bui AA, Kangarloo H (2002) Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp, pp 757-761.

    Google Scholar 

  150. Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. Proc 40th Ann Meeting Assoc Computational Linguistics, Philadelphia, PA, pp 120-127.

    Google Scholar 

  151. Taskar B, Klein D, Collins M, Koller D, Manning C (2004) Max-margin parsing. Proc Empirical Methods in Natural Language Processing.

    Google Scholar 

  152. Tersmette S, Moore M (1988) Boundary word techniques for isolating multiword terminologies. Proc Ann Symp Computer Applications in Medical Care, pp 207-211.

    Google Scholar 

  153. Thede SM, Harper MP (1999) A second-order hidden Markov model for part-of-speech tagging. Proc 37th Annual Meeting ACL on Computational Linguistics, pp 175-182.

    Google Scholar 

  154. Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc 16th Intl Machine Learning Conf, Bled, Slovenia, pp 406-414.

    Google Scholar 

  155. Tjong EF, Sang K (2000) Noun phrase recognition by system combination. Proc 1st Meeting of the North American Chapter for the Association for Computational Linguistics, pp 50-55.

    Google Scholar 

  156. Tolentino HD, Matters MD, Walop W, Law B, Tong W, Liu F, Fontelo P, Kohl K, Payne DC (2007) A UMLS-based spell checker for natural language processing in vaccine safety. BMC Med Inform Decis Mak, 7:3.

    Article  Google Scholar 

  157. Tomanek K, Wermter J, Hahn U (2007) A reappraisal of sentence and token splitting for life sciences documents. Stud Health Technol Inform, 129(Pt 1):524-528.

    Google Scholar 

  158. Trieschnigg D, Kraaij W, de Jong F (2007) The influence of basic tokenization on biomedical document retrieval. Proc 30th Ann Intl ACM SIGIR Conf Research and Development in Information Retrieval, pp 803-804.

    Google Scholar 

  159. Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550-563.

    Article  Google Scholar 

  160. van den Bosch A, Buchholz S (2001) Shallow parsing on the basis of words only: A case study. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 433-440.

    Google Scholar 

  161. Veenstra J, Van den Bosch A (2000) Single-classifier memory-based phrase chunking. Proc CoNLL, Lisbon, Portugal, pp 157-159.

    Google Scholar 

  162. Vilain M, Day D (2000) Phrase parsing with rule sequence processors: An application to the shared CoNLL task. Proc CoNLL-2000 and LLL-2000, pp 160-162.

    Google Scholar 

  163. Weeber M, Mork JG, Aronson AR (2001) Developing a test collection for biomedical word sense disambiguation. Proc AMIA Symp, pp 746-750.

    Google Scholar 

  164. Xiao J, Wang X, Liu B (2007) The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task. ACM Trans Asian Language Inforamtion Processing, 6(2):1-29.

    Google Scholar 

  165. Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. Proc 33rd Annual Meeting Assoc Computational Linguistics, pp 189-196.

    Google Scholar 

  166. Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 9(3):262-272.

    Article  Google Scholar 

  167. Zeng QT, Tse T (2006) Exploring and developing consumer health vocabularies. J Am Med Inform Assoc, 13(1):24-29.

    Article  Google Scholar 

  168. Zhou GD, Su J, Tey TG (2000) Hybrid text chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 163-165.

    Google Scholar 

  169. Zitouni I (2007) Backoff hierarchical class n-gram language models: Effectiveness to model unseen events in speech recognition. Computer Speech and Language, 21(1):88-104.

    Article  Google Scholar 

  170. Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H (2003) IndexFinder: A method of extracting key concepts from clinical texts for indexing. Proc AMIA Symp, pp 763-767.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricky K. Taira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Taira, R.K. (2010). Natural Language Processing of Medical Reports. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0385-3_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0384-6

  • Online ISBN: 978-1-4419-0385-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics