Natural Language Processing of Medical Reports

Taira, Ricky K.

doi:10.1007/978-1-4419-0385-3_6

Ricky K. Taira³

2232 Accesses

Abstract

A significant amount of information regarding the observations, assessments, and recommendations related to a patient's case is documented within free-text medical reports. The ability to structure and standardize clinical patient data has been a grand goal of medical informatics since the inception of the field - especially if this structuring can be (automatically) achieved at the patient bedside and within the modus operandi of current medical practice. A computational infrastructure that transforms the process of clinical data collection from an uncontrolled to highly controlled operation (i.e., precise, completely specified, standard representation) can facilitate medical knowledge acquisition and its application to improve healthcare. Medical natural language processing (NLP) systems attempt to interpret free-text to facilitate a clinical, research, or teaching task. An NLP system performs translates a source language (e.g., free-text) to a target surrogate, computer-understandable representation (e.g., first-order logic), which in turn can support the operations of a driving application. NLP is really then a transformation from a representational form that is not very useful from the perspective of a computer (a sequence of characters) to a form that is useful (a logic-based representation of the text meaning). In general, the accuracy and speed of translation is heavily dependent on the end application. This chapter presents work related to natural language processing of clinical reports, covering issues related to representation, computation, and evaluation. We first summarize a number of typical clinical applications. We then present a high-level formalization of the medical NLP problem in order to provide structure as to how various aspects of NLP fit and complement one another. Examples of approaches that target various forms of representations and degrees of potential accuracy are discussed. Individual NLP subtasks are subsequently discussed. We conclude this chapter with evaluation methods and a discussion of the directions expected in the processing of clinical medical reports. Throughout, we describe applications illustrating the many open issues revolving around medical natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abney S (2002) Bootstrapping. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 360-367.
Google Scholar
Aho AV, Corasick MJ (1975) Efficient string matching: Aid to bibliographic search. Comm ACM, 18(6):333-340.
Article MATH MathSciNet Google Scholar
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp, pp 17-21.
Google Scholar
Baneyx A, Charlet J, Jaulent MC (2006) Methodology to build medical ontology from textual resources. Proc AMIA Symp, pp 21-25.
Google Scholar
Bashyam V (2008) Towards a canonical representation for machine understanding of natural language in radiology reports. Department of Information Studies, PhD dissertation. University of California Los Angeles.
Google Scholar
Bashyam V, Taira RK (2005) Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA. Proc AMIA Symp pp 26-30.
Google Scholar
Bashyam V, Taira RK (2005) A study of lexical behaviour of sentences in chest radiology reports. Proc AMIA Symp p 891.
Google Scholar
Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G (2003) Detecting adverse events using information technology. J Am Med Inform Assoc, 10(2):115-128.
Article Google Scholar
Baud R (2004) A natural language based search engine for ICD10 diagnosis encoding. Med Arh, 58(1 Suppl 2):79-80.
Google Scholar
Becker GJ (2005) Restructuring cancer clinical trials. J Am Coll Radiol, 2(10):816-817.
Article Google Scholar
Bell GB, Sethi A (2001) Matching records in a national medical patient index. Communications of the ACM, 44(9):83-88.
Article Google Scholar
Berger AL, DellaPietra SA, DellaPietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71.
Google Scholar
Berman JJ (2004) Pathology abbreviated: A long review of short terms. Arch Pathol Lab Med, 128(3):347-352.
Google Scholar
Berrios DC (2000) Automated indexing for full text information retrieval. Proc AMIA Symp, pp 71-75.
Google Scholar
Black A, van de Plassche J, Williams B (1991) Analysis of unknown words through morphological decomposition. Proc 5th Conf European Chapter of the Association of Computational Linguistics, pp 101-106.
Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proc 11th Annual Conf Computational Learning Theory, pp 92-100.
Google Scholar
Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform, 36(6):414-432.
Article Google Scholar
Booker DL, Berman JJ (2004) Dangerous abbreviations. Hum Pathol, 35(5):529-531.
Article Google Scholar
Bouillon P, Rayner M, Chatzichrisafis N, Hockey BA, Santaholma M, Starlander M, Nakao Y, Kanzaki K, Isahara H (2005) A generic multi-lingual open source platform for limited-domain medical speech translation. Proc 10th Annual Conf European Association for Machine Translation, pp 50-58.
Google Scholar
Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543-565.
Google Scholar
Budiu R, Anderson JR (2004) Interpretation-based processing: A unified theory of semantic sentence comprehension. Cognitive Science, 28(1):1-44.
Article Google Scholar
Campbell DA, Johnson SB (2001) Comparing syntactic complexity in medical and non-medical corpora. Proc AMIA Symp, pp 90-94.
Google Scholar
Campbell DA, Johnson SB (2002) A transformational-based learner for dependency grammars in discharge summaries. Proc ACL-02 Workshop on Natural language Processing in the Biomedical Domain, vol 3, pp 37-44.
Google Scholar
Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak G (2005) Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. Proc AMIA Symp, pp 106-110.
Google Scholar
Cardie C (1994) Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis. Department of Computer Science PhD dissertation. University of Massachusetts, Amherst.
Google Scholar
Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J (1999) Simplifying text for language-impaired readers. Proc 9th Conf European Chapter of the Association of Computational Linguistics, pp 269-270.
Google Scholar
Carter PI (2004) HIPAA Compliance Handbook 2004. Aspen Publishing, Gaithersburg, MD.
Google Scholar
Chao G (2002) Recurrent probabilistic modeling and its application to part-of-speech tagging. Proc 40th Annual Meeting Assoc Computational Linguistics: Student Research Workshop, pp 6-11.
Google Scholar
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform, 34(5):301-310.
Article Google Scholar
Chapman WW, Chu D, Dowling JN (2007) ConText: An algorithm for identifying contextual features from clinical text. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 81-88.
Google Scholar
Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC (2004) Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Stud Health Technol Inform, 107(Pt 1):487-491.
Google Scholar
Charniak E (2001) Unsupervised learning of name structure from coreference data. Proc North American Chapter Assoc Computational Linguistics, pp 48-54.
Google Scholar
Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4):359-394.
Article Google Scholar
Chinchor N, Marsh E (1998) MUC-7 named entity task definition. Proc 7th Message Understanding Conference (MUC-7).
Google Scholar
Cho PS, Taira RK, Kangarloo H (2002) Text boundary detection of medical reports. Proc AMIA Symp, pp 155-159.
Google Scholar
Cho PS, Taira RK, Kangarloo H (2003) Automatic section segmentation of medical reports. Proc AMIA Symp, pp 155-159.
Google Scholar
Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: A probabilistic medical language understanding system. Proc ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, vol 3, pp 29-36.
Google Scholar
Ciaramita M, Johnson M (2000) Explaining away ambiguity: Learning verb selectional preference with Bayesian networks. Proc 18th Conf Computational Linguistics, vol 1, pp 187-193.
Google Scholar
Clegg AB, Shepherd AJ (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics, 8:24-40.
Article Google Scholar
Coates-Stephens S (1992) The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities, 26(5):441-456.
Article Google Scholar
Coden AR, Pakhomov SV, Ando RK, Duffy PH, Chute CG (2005) Domain-specific language models and lexicons for tagging. J Biomed Inform, 38(6):422-430.
Article Google Scholar
Cohen KB, Hunter L (2006) A critical review of PASBio's argument structures for biomedical verbs. BMC Bioinformatics, 7 Suppl 3:S5.
Article Google Scholar
Cohn A (1996) Calculi for qualitative spatial reasoning. Artificial Intelligence and Symbolic Mathematical Computation, pp 124-143.
Google Scholar
Collins M (2002) Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 489-496.
Google Scholar
Computational Mdeicine Center (2009) International Challenge: Classifying Clinical Free Text Using Natural Language Processing. http://www.computationalmedicine.org-/challenge . Accessed April 14, 2009.
D'Avolio LW, Litwin MS, Rogers SO, Jr., Bui AA (2008) Facilitating clinical outcomes assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc, 15(3):341-348.
Article Google Scholar
Dejean H (2000) ALLiS: A symbolic learning system for natural language learning. Proc CoNLL-2000 and LLL-2000, pp 95-98.
Google Scholar
DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31-39.
Google Scholar
Divita G, Browne AC, Rindflesch TC (1998) Evaluating lexical variant generation to improve information retrieval. Proc AMIA Symp, pp 775-779.
Google Scholar
Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, Shabo Shvo A (2006) HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc, 13(1):30-39.
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern Classification. 2nd edition. Wiley, New York, NY.
MATH Google Scholar
Eck M, Vogel S, Waibel A (2004) Improving statistical machine translation in the medical domain using the unified medical language system. Proc 20th Intl Conf Computational Linguistics.
Google Scholar
Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol, 22(10):1315-1316.
Article Google Scholar
Eng J, Eisner JM (2004) Radiology report entry with automatic phrase completion driven by language modeling. RadioGraphics, 24(5):1493-1501.
Article Google Scholar
Fellbaum C (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
Google Scholar
Feng D, Burns G, Zhu J, Hovy EH (2008) Towards automated semantic analysis on biomedical research articles. Proc 3rd Intl Joint Conf Natural Language Processing.
Google Scholar
Firth JR (1957) Modes of meaning. In: Firth JR (ed) Papers in Linguistics 1934-1951. Oxford University Press, London.
Google Scholar
Fisk JM, Mutalik P, Levin FW, Erdos J, Taylor C, Nadkarni P (2003) Integrating query of relational and textual data in clinical databases: A case study. J Am Med Inform Assoc, 10(1):21-38.
Article Google Scholar
Forney Jr GD (1973) The Viterbi algorithm. Proceedings of the IEEE, 61(3):268-278.
Article MathSciNet Google Scholar
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc, 1(2):161-174.
Google Scholar
Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proc AMIA Symp:855-859.
Google Scholar
Friedman C, Huff SM, Hersh WR, Pattisongordon E, Cimino JJ (1995) The Canon Group's effort: Working toward a merged model. J Am Med Inform Assoc, 2(1):4-18.
Google Scholar
Friedman C, Kra P, Rzhetsky A (2002) Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform, 35(4):222-235.
Article Google Scholar
Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc, 11(5):392-402.
Article Google Scholar
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proc 17th Intl Conf Machine Learning (ICML-2000), pp 327-334.
Google Scholar
Guihenneuc-Jouyaux C, Richardson S, Longini IM, Jr. (2000) Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline. Biometrics, 56(3):733-741.
Article MATH Google Scholar
Gundlapalli AV, South BR, Phansalkar S, Kinney AY, Shen S (2008) Application of natural language processing to VA electronic health records to identify phenotypic characteristics for clinical and research purposes. Proc 2008 AMIA Summit on Translational Bioinformatics, pp 36-40.
Google Scholar
Gupta A, Ludascher B, Grethe JS, Martone ME (2003) Towards a formalization of disease-specific ontologies for neuroinformatics. Neural Networks 16:1277-1292.
Article Google Scholar
Gupta D, Saul M, Gilbertson J (2004) Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol, 121(2):176-186.
Article Google Scholar
Hachey B, Alex B, Mecker M (2005) Investigating the effects of selective sampling on the annotation task. Proc 9th Conf Computational Natural Language Processing, pp 144-151.
Google Scholar
Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Symp, pp 814-818.
Google Scholar
Heinze DT, Morsch ML, Sheffer RE, Jimmink MA, Jennings MA, Morris WC, Morch AEW (2001) LifeCode: A deployed application for automated medical coding. AI Magazine, 22(2):76-88.
Google Scholar
Hersh WR, Campbell EM, Malveau SE (1997) Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: A lexical analysis. Proc AMIA Fall Symp, pp 580-584.
Google Scholar
Herzig TW, Johns M (1997) Extraction of medical information from textual sources: A statistical variant of the boundary-word method. J Am Med Inform Assoc:859-859.
Google Scholar
Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology, 224(1):157-163.
Article Google Scholar
Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc, 14(3):304-311.
Article Google Scholar
Huang Y, Lowe HJ, Klein D, Cucina RJ (2005) Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc, 12(3):275-285.
Article Google Scholar
Huddleston R (1984) Introduction to the Grammar of English. Cambridge University Press, Cambridge, MA.
Google Scholar
Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, Rindflesch TC (2006) Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. J Am American Society for Information Science and Technology, 57(1):96-113.
Article Google Scholar
Iwanska LM, Shapiro SC (2000) Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language. AAAI Press, Menlo Park, CA.
MATH Google Scholar
Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: A review. IEEE Trans Pattern Analysis and Machine Intelligence, 22(1):4-37.
Article Google Scholar
Jelinek F (1999) Statistical Methods for Speech Recognition. 2nd edition. MIT press, Cambridge, MA.
Google Scholar
Johansson C (2000) A context sensitive maximum likelihood approach to chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 136-138.
Google Scholar
Johnson DB, Chu WW, Dionisio JD, Taira RK, Kangarloo H (1999) Creating and indexing teaching files from free-text patient reports. Proc AMIA Symp, pp 814-818.
Google Scholar
Johnson SB (1998) Conceptual graph grammar: A simple formalism for sublanguage. Methods Inf Med, 37(4-5):345-352.
Google Scholar
Johnson SB (1999) A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6(3):205-218.
Google Scholar
Joshi M, Pedersen MJT, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. Proc 21st National Conf Artificial Intelligence.
Google Scholar
Jurafsky D, Martin JH (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.
Google Scholar
Karlsson F (1990) Constraint grammar as a framework for parsing running text. Proc 13th Annual Conf Computational Linguistics, pp 168-173.
Google Scholar
Kudo T, Matsumoto Y (2001) Chunking with support vector machines. Proc 2nd Meeting North American Chapter Assoc Computational Linguistics on Language Technologies, pp 192-199.
Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc 18th Intl Conf Machine Learning, pp 282-289.
Google Scholar
Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC (2002) Terminology extraction from text to build an ontology in surgical intensive care. Proc AMIA Symp, pp 430-434.
Google Scholar
Lee DL, Chuang H, Seamons K (1997) Document ranking and the vector-space model. IEEE Software, 14(2):67-75.
Article Google Scholar
Li L, Chase HS, Patel CO, Friedman C, Weng C (2008) Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: A case study. Proc AMIA Symp, pp 404-408.
Google Scholar
Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med, 32(4):281-291.
Google Scholar
Liu K, Chapman W, Hwa R, Crowley RS (2007) Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. J Am Med Inform Assoc, 14(5):641-650.
Article Google Scholar
Lovis C, Michel PA, Baud R, Scherrer JR (1995) Word segmentation processing: A way to exponentially extend medical dictionaries. Proc MedInfo, vol 8 Pt 1, pp 28-32.
Google Scholar
Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR (1991) The application of natural-language processing to healthcare quality assessment. Med Decis Making, 11(4 Suppl):S65-68.
Google Scholar
Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
MATH Google Scholar
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
Google Scholar
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Proc 7th Intl Conf Machine Learning, pp 591-598.
Google Scholar
McCray AT, Bodenreider O, Malley JD, Browne AC (2001) Evaluating UMLS strings for natural language processing. Proc AMIA Symp, pp 448-452.
Google Scholar
McDonald DD (1993) Internal and external evidence in the identification and semantic categorization of proper names. Acquisition of Lexical Knowledge from Text: Proc Workshop Sponsored by the Special Interest Group on the Lexicon of the ACL, pp 32-43.
Google Scholar
McDonald DD (1996) Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev B, Pustejovsky J (eds) Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA, pp 21-39.
Google Scholar
McRoy SW, Ali SS, Haller SM (1997) Uniform knowledge representation for language processing in the B2 system. Natural Language Engineering, 3(2):123-145.
Article Google Scholar
Melton GB, Hripcsak G (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc, 12(4):448-457.
Article Google Scholar
Meng H, Lam W, Low KF (1999) Learning belief networks for language understanding. Proc ASRU.
Google Scholar
Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak, 5:30.
Article Google Scholar
Meystre S, Haug PJ (2006) Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform, 39(6):589-599.
Article Google Scholar
Mikheev A (2000) Tagging sentence boundaries. Proc 1st North American Chapter Assoc Computational Linguistics Conf, pp 264-271.
Google Scholar
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: An on-line lexical database. Intl J Lexicography, 3(4):235-244.
Article Google Scholar
Miller JE, Torii M, Vijay-Shanker K (2007) Adaptation of POS tagging for multiple biomedical domains. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 179-180.
Book Google Scholar
Minsky ML, Papert S (1988) Perceptrons: An Introduction to Computational Geometry. Expanded edition. MIT Press, Cambridge, MA.
Google Scholar
Molina A, Pla F (2002) Shallow parsing using specialized HMMs. J Machine Learning Research, 2(4):595-613.
Article MATH Google Scholar
Nadkarni P, Chen R, Brandt C (2001) UMLS concept indexing for production databases: A feasibility study. J Am Med Inform Assoc, 8(1):80-91.
Google Scholar
Navigli R (2009) Word sense disambiguation: A survey. ACM Computing Surveys, 41(2):1-69.
Article Google Scholar
Neamatullah I, Douglass MM, Lehman LWH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD (2008) Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(32):1-17.
Google Scholar
Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD (1995) Identifying concepts in medical knowledge. Proc MedInfo, vol 8, pp 33-36.
Google Scholar
Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc 24th Intl Conf Machine Learning, pp 681-688.
Google Scholar
Pakhomov S, Pedersen T, Chute CG (2005) Abbreviation and acronym disambiguation in clinical discourse. Proc AMIA Symp, pp 589-593.
Google Scholar
Pedersen MJT, Banerjee S, Patwardhan S (2005) Maximizing semantic relatedness to perform word sense disambiguation (Technical Report). University of Minnesota Supercomputing Institute.
Google Scholar
Penz JF, Wilcox AB, Hurdle JF (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform, 40(2):174-182.
Article Google Scholar
Pestian JP, Itert L, Duch W (2004) Development of a pediatric text-corpus for part-of-speech tagging. In: Wierzchon ST, Trojanowski K (eds) Intelligent Information Processing and the Web. Springer, pp 219-226.
Google Scholar
Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large datasets. Proc 2001 Conf Empirical Methods in Natural Language Processing, pp 1–9.
Google Scholar
Polackova G (2008) Understanding and use of phrasal verbs and idioms in medical/nursing texts. Bratisl Lek Listy, 109(11):531-532.
Google Scholar
Pyper C, Amery J, Watson M, Crook C (2004) Patients' experiences when accessing their on-line electronic patient records in primary care. British Journal of General Practice, 54(498):38-43.
Google Scholar
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.
Article Google Scholar
Radiological Society of North America (2009) RadLex: A Lexicon for Uniform Indexing and Retrieval of Radiology Information Resources. http://www.rsna.org/radlex/ . Accessed April 14, 2009.
Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. Proc Conf Empirical Methods in Natural Language Processing, pp 133-142.
Google Scholar
Ratnaparkhi A (1998) Maximum Entropy Models for Natural Language Ambiguity Resolution. Department of Computer and Information Science PhD dissertation. University of Pennsylvania.
Google Scholar
Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO (1997) Maintaining the confidentiality of medical records shared over the Internet and the World Wide Web. Ann Intern Med, 127(2):138-141.
Google Scholar
Roth D (1999) Memory based learning (Technical Report UIUCDCS-R-99-2125). Department of Computer Science, University of Illinois at Urbana-Champaign.
Google Scholar
Ruch P, Baud R, Geissbuhler A (2003) Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1-2):169-184.
Article Google Scholar
Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G (2000) Medical document anonymization with a semantic lexicon. Proc AMIA Symp, pp 729-733.
Google Scholar
Ruppenhofer J, Ellsworth M, Petruck M, Johnson C (2005) FrameNet II: Extended Theory and Practice (Technical Report). ICSI, Berkeley, CA.
Google Scholar
Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: Applications to patient data representation and automatic encoding. Methods Inf Med, 34(1-2):140-146.
Google Scholar
Salton G (1988) Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.
Google Scholar
Sang EFTK, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: Chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 127-132.
Google Scholar
Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC, Chute CG (2008) Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform, 41(6):1088-1100.
Article Google Scholar
Schulz S, Hahn U (2000) Morpheme-based, cross-lingual indexing for medical document retrieval. Int J Med Inform, 58-59:87-99.
Article Google Scholar
Schulz S, Honeck M, Hahn U (2002) Biomedical text retrieval in languages with a complex morphology. Proc Workshop on NLP in the Biomedical Domain, pp 61-68.
Google Scholar
Skut W, Brants T (1998) Chunk tagger: Statistical recognition of noun phrases. Proc ESSLLI-1998 Workshop on Automated Acquisition of Syntax and Parsing.
Google Scholar
Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C (2005) Relations in biomedical ontologies. Genome Biol, 6(5):R46.
Article Google Scholar
Smith L, Rindflesch T, Wilbur WJ (2004) MedPost: A part-of-speech tagger for biomedical text. Bioinformatics, 20(14):2320-2321.
Article Google Scholar
Strzalkowski T (1999) Natural Language Information Retrieval. Kluwer Academic, Boston, MA.
Google Scholar
Sweeney L (1996) Replacing personally-identifying information in medical records: The Scrub system. Proc AMIA Symp, pp 333-337.
Google Scholar
Taira R, Bui AA, Hsu W, Bashyam V, Dube S, Watt E, Andrada L, El-Saden S, Cloughesy T, Kangarloo H (2008) A tool for improving the longitudinal imaging characterization for neuro-oncology cases. Proc AMIA Symp, pp 712-716.
Google Scholar
Taira RK, Bui AA, Kangarloo H (2002) Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp, pp 757-761.
Google Scholar
Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. Proc 40th Ann Meeting Assoc Computational Linguistics, Philadelphia, PA, pp 120-127.
Google Scholar
Taskar B, Klein D, Collins M, Koller D, Manning C (2004) Max-margin parsing. Proc Empirical Methods in Natural Language Processing.
Google Scholar
Tersmette S, Moore M (1988) Boundary word techniques for isolating multiword terminologies. Proc Ann Symp Computer Applications in Medical Care, pp 207-211.
Google Scholar
Thede SM, Harper MP (1999) A second-order hidden Markov model for part-of-speech tagging. Proc 37th Annual Meeting ACL on Computational Linguistics, pp 175-182.
Google Scholar
Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc 16th Intl Machine Learning Conf, Bled, Slovenia, pp 406-414.
Google Scholar
Tjong EF, Sang K (2000) Noun phrase recognition by system combination. Proc 1st Meeting of the North American Chapter for the Association for Computational Linguistics, pp 50-55.
Google Scholar
Tolentino HD, Matters MD, Walop W, Law B, Tong W, Liu F, Fontelo P, Kohl K, Payne DC (2007) A UMLS-based spell checker for natural language processing in vaccine safety. BMC Med Inform Decis Mak, 7:3.
Article Google Scholar
Tomanek K, Wermter J, Hahn U (2007) A reappraisal of sentence and token splitting for life sciences documents. Stud Health Technol Inform, 129(Pt 1):524-528.
Google Scholar
Trieschnigg D, Kraaij W, de Jong F (2007) The influence of basic tokenization on biomedical document retrieval. Proc 30th Ann Intl ACM SIGIR Conf Research and Development in Information Retrieval, pp 803-804.
Google Scholar
Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550-563.
Article Google Scholar
van den Bosch A, Buchholz S (2001) Shallow parsing on the basis of words only: A case study. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 433-440.
Google Scholar
Veenstra J, Van den Bosch A (2000) Single-classifier memory-based phrase chunking. Proc CoNLL, Lisbon, Portugal, pp 157-159.
Google Scholar
Vilain M, Day D (2000) Phrase parsing with rule sequence processors: An application to the shared CoNLL task. Proc CoNLL-2000 and LLL-2000, pp 160-162.
Google Scholar
Weeber M, Mork JG, Aronson AR (2001) Developing a test collection for biomedical word sense disambiguation. Proc AMIA Symp, pp 746-750.
Google Scholar
Xiao J, Wang X, Liu B (2007) The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task. ACM Trans Asian Language Inforamtion Processing, 6(2):1-29.
Google Scholar
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. Proc 33rd Annual Meeting Assoc Computational Linguistics, pp 189-196.
Google Scholar
Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 9(3):262-272.
Article Google Scholar
Zeng QT, Tse T (2006) Exploring and developing consumer health vocabularies. J Am Med Inform Assoc, 13(1):24-29.
Article Google Scholar
Zhou GD, Su J, Tey TG (2000) Hybrid text chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 163-165.
Google Scholar
Zitouni I (2007) Backoff hierarchical class n-gram language models: Effectiveness to model unseen events in speech recognition. Computer Speech and Language, 21(1):88-104.
Article Google Scholar
Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H (2003) IndexFinder: A method of extracting key concepts from clinical texts for indexing. Proc AMIA Symp, pp 763-767.
Google Scholar

Download references

Author information

Authors and Affiliations

Medical Imaging Informatics Group Department of Radiological Sciences, David Geffen School of Medicine University of California, Los Angeles, 924 Westwood Blvd., Suite 420, Los Angeles, CA, 90024, USA
Ricky K. Taira

Authors

Ricky K. Taira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricky K. Taira .

Editor information

Editors and Affiliations

Medical Imaging Informatics Group, University of California, Los Angeles, Westwood Blvd. 924 , Los Angeles, 90024, U.S.A.
Alex A.T. Bui
Medical Imaging Informatics Group, University of California, Los Angeles, Westwood Blvd. 924 , Los Angeles, 90024, U.S.A.
Ricky K. Taira

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Taira, R.K. (2010). Natural Language Processing of Medical Reports. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_6

Download citation

DOI: https://doi.org/10.1007/978-1-4419-0385-3_6
Published: 10 October 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0384-6
Online ISBN: 978-1-4419-0385-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics