Abstract
A significant amount of information regarding the observations, assessments, and recommendations related to a patient's case is documented within free-text medical reports. The ability to structure and standardize clinical patient data has been a grand goal of medical informatics since the inception of the field - especially if this structuring can be (automatically) achieved at the patient bedside and within the modus operandi of current medical practice. A computational infrastructure that transforms the process of clinical data collection from an uncontrolled to highly controlled operation (i.e., precise, completely specified, standard representation) can facilitate medical knowledge acquisition and its application to improve healthcare. Medical natural language processing (NLP) systems attempt to interpret free-text to facilitate a clinical, research, or teaching task. An NLP system performs translates a source language (e.g., free-text) to a target surrogate, computer-understandable representation (e.g., first-order logic), which in turn can support the operations of a driving application. NLP is really then a transformation from a representational form that is not very useful from the perspective of a computer (a sequence of characters) to a form that is useful (a logic-based representation of the text meaning). In general, the accuracy and speed of translation is heavily dependent on the end application. This chapter presents work related to natural language processing of clinical reports, covering issues related to representation, computation, and evaluation. We first summarize a number of typical clinical applications. We then present a high-level formalization of the medical NLP problem in order to provide structure as to how various aspects of NLP fit and complement one another. Examples of approaches that target various forms of representations and degrees of potential accuracy are discussed. Individual NLP subtasks are subsequently discussed. We conclude this chapter with evaluation methods and a discussion of the directions expected in the processing of clinical medical reports. Throughout, we describe applications illustrating the many open issues revolving around medical natural language processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abney S (2002) Bootstrapping. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 360-367.
Aho AV, Corasick MJ (1975) Efficient string matching: Aid to bibliographic search. Comm ACM, 18(6):333-340.
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp, pp 17-21.
Baneyx A, Charlet J, Jaulent MC (2006) Methodology to build medical ontology from textual resources. Proc AMIA Symp, pp 21-25.
Bashyam V (2008) Towards a canonical representation for machine understanding of natural language in radiology reports. Department of Information Studies, PhD dissertation. University of California Los Angeles.
Bashyam V, Taira RK (2005) Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA. Proc AMIA Symp pp 26-30.
Bashyam V, Taira RK (2005) A study of lexical behaviour of sentences in chest radiology reports. Proc AMIA Symp p 891.
Bates DW, Evans RS, Murff H, Stetson PD, Pizziferri L, Hripcsak G (2003) Detecting adverse events using information technology. J Am Med Inform Assoc, 10(2):115-128.
Baud R (2004) A natural language based search engine for ICD10 diagnosis encoding. Med Arh, 58(1 Suppl 2):79-80.
Becker GJ (2005) Restructuring cancer clinical trials. J Am Coll Radiol, 2(10):816-817.
Bell GB, Sethi A (2001) Matching records in a national medical patient index. Communications of the ACM, 44(9):83-88.
Berger AL, DellaPietra SA, DellaPietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71.
Berman JJ (2004) Pathology abbreviated: A long review of short terms. Arch Pathol Lab Med, 128(3):347-352.
Berrios DC (2000) Automated indexing for full text information retrieval. Proc AMIA Symp, pp 71-75.
Black A, van de Plassche J, Williams B (1991) Analysis of unknown words through morphological decomposition. Proc 5th Conf European Chapter of the Association of Computational Linguistics, pp 101-106.
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proc 11th Annual Conf Computational Learning Theory, pp 92-100.
Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform, 36(6):414-432.
Booker DL, Berman JJ (2004) Dangerous abbreviations. Hum Pathol, 35(5):529-531.
Bouillon P, Rayner M, Chatzichrisafis N, Hockey BA, Santaholma M, Starlander M, Nakao Y, Kanzaki K, Isahara H (2005) A generic multi-lingual open source platform for limited-domain medical speech translation. Proc 10th Annual Conf European Association for Machine Translation, pp 50-58.
Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543-565.
Budiu R, Anderson JR (2004) Interpretation-based processing: A unified theory of semantic sentence comprehension. Cognitive Science, 28(1):1-44.
Campbell DA, Johnson SB (2001) Comparing syntactic complexity in medical and non-medical corpora. Proc AMIA Symp, pp 90-94.
Campbell DA, Johnson SB (2002) A transformational-based learner for dependency grammars in discharge summaries. Proc ACL-02 Workshop on Natural language Processing in the Biomedical Domain, vol 3, pp 37-44.
Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak G (2005) Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. Proc AMIA Symp, pp 106-110.
Cardie C (1994) Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis. Department of Computer Science PhD dissertation. University of Massachusetts, Amherst.
Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J (1999) Simplifying text for language-impaired readers. Proc 9th Conf European Chapter of the Association of Computational Linguistics, pp 269-270.
Carter PI (2004) HIPAA Compliance Handbook 2004. Aspen Publishing, Gaithersburg, MD.
Chao G (2002) Recurrent probabilistic modeling and its application to part-of-speech tagging. Proc 40th Annual Meeting Assoc Computational Linguistics: Student Research Workshop, pp 6-11.
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform, 34(5):301-310.
Chapman WW, Chu D, Dowling JN (2007) ConText: An algorithm for identifying contextual features from clinical text. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 81-88.
Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC (2004) Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Stud Health Technol Inform, 107(Pt 1):487-491.
Charniak E (2001) Unsupervised learning of name structure from coreference data. Proc North American Chapter Assoc Computational Linguistics, pp 48-54.
Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4):359-394.
Chinchor N, Marsh E (1998) MUC-7 named entity task definition. Proc 7th Message Understanding Conference (MUC-7).
Cho PS, Taira RK, Kangarloo H (2002) Text boundary detection of medical reports. Proc AMIA Symp, pp 155-159.
Cho PS, Taira RK, Kangarloo H (2003) Automatic section segmentation of medical reports. Proc AMIA Symp, pp 155-159.
Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: A probabilistic medical language understanding system. Proc ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, vol 3, pp 29-36.
Ciaramita M, Johnson M (2000) Explaining away ambiguity: Learning verb selectional preference with Bayesian networks. Proc 18th Conf Computational Linguistics, vol 1, pp 187-193.
Clegg AB, Shepherd AJ (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics, 8:24-40.
Coates-Stephens S (1992) The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities, 26(5):441-456.
Coden AR, Pakhomov SV, Ando RK, Duffy PH, Chute CG (2005) Domain-specific language models and lexicons for tagging. J Biomed Inform, 38(6):422-430.
Cohen KB, Hunter L (2006) A critical review of PASBio's argument structures for biomedical verbs. BMC Bioinformatics, 7 Suppl 3:S5.
Cohn A (1996) Calculi for qualitative spatial reasoning. Artificial Intelligence and Symbolic Mathematical Computation, pp 124-143.
Collins M (2002) Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 489-496.
Computational Mdeicine Center (2009) International Challenge: Classifying Clinical Free Text Using Natural Language Processing. http://www.computationalmedicine.org-/challenge . Accessed April 14, 2009.
D'Avolio LW, Litwin MS, Rogers SO, Jr., Bui AA (2008) Facilitating clinical outcomes assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc, 15(3):341-348.
Dejean H (2000) ALLiS: A symbolic learning system for natural language learning. Proc CoNLL-2000 and LLL-2000, pp 95-98.
DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31-39.
Divita G, Browne AC, Rindflesch TC (1998) Evaluating lexical variant generation to improve information retrieval. Proc AMIA Symp, pp 775-779.
Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, Shabo Shvo A (2006) HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc, 13(1):30-39.
Duda RO, Hart PE, Stork DG (2001) Pattern Classification. 2nd edition. Wiley, New York, NY.
Eck M, Vogel S, Waibel A (2004) Improving statistical machine translation in the medical domain using the unified medical language system. Proc 20th Intl Conf Computational Linguistics.
Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol, 22(10):1315-1316.
Eng J, Eisner JM (2004) Radiology report entry with automatic phrase completion driven by language modeling. RadioGraphics, 24(5):1493-1501.
Fellbaum C (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
Feng D, Burns G, Zhu J, Hovy EH (2008) Towards automated semantic analysis on biomedical research articles. Proc 3rd Intl Joint Conf Natural Language Processing.
Firth JR (1957) Modes of meaning. In: Firth JR (ed) Papers in Linguistics 1934-1951. Oxford University Press, London.
Fisk JM, Mutalik P, Levin FW, Erdos J, Taylor C, Nadkarni P (2003) Integrating query of relational and textual data in clinical databases: A case study. J Am Med Inform Assoc, 10(1):21-38.
Forney Jr GD (1973) The Viterbi algorithm. Proceedings of the IEEE, 61(3):268-278.
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc, 1(2):161-174.
Friedman C, Hripcsak G, Shablinsky I (1998) An evaluation of natural language processing methodologies. Proc AMIA Symp:855-859.
Friedman C, Huff SM, Hersh WR, Pattisongordon E, Cimino JJ (1995) The Canon Group's effort: Working toward a merged model. J Am Med Inform Assoc, 2(1):4-18.
Friedman C, Kra P, Rzhetsky A (2002) Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform, 35(4):222-235.
Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc, 11(5):392-402.
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proc 17th Intl Conf Machine Learning (ICML-2000), pp 327-334.
Guihenneuc-Jouyaux C, Richardson S, Longini IM, Jr. (2000) Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline. Biometrics, 56(3):733-741.
Gundlapalli AV, South BR, Phansalkar S, Kinney AY, Shen S (2008) Application of natural language processing to VA electronic health records to identify phenotypic characteristics for clinical and research purposes. Proc 2008 AMIA Summit on Translational Bioinformatics, pp 36-40.
Gupta A, Ludascher B, Grethe JS, Martone ME (2003) Towards a formalization of disease-specific ontologies for neuroinformatics. Neural Networks 16:1277-1292.
Gupta D, Saul M, Gilbertson J (2004) Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol, 121(2):176-186.
Hachey B, Alex B, Mecker M (2005) Investigating the effects of selective sampling on the annotation task. Proc 9th Conf Computational Natural Language Processing, pp 144-151.
Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Symp, pp 814-818.
Heinze DT, Morsch ML, Sheffer RE, Jimmink MA, Jennings MA, Morris WC, Morch AEW (2001) LifeCode: A deployed application for automated medical coding. AI Magazine, 22(2):76-88.
Hersh WR, Campbell EM, Malveau SE (1997) Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: A lexical analysis. Proc AMIA Fall Symp, pp 580-584.
Herzig TW, Johns M (1997) Extraction of medical information from textual sources: A statistical variant of the boundary-word method. J Am Med Inform Assoc:859-859.
Hripcsak G, Austin JH, Alderson PO, Friedman C (2002) Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology, 224(1):157-163.
Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc, 14(3):304-311.
Huang Y, Lowe HJ, Klein D, Cucina RJ (2005) Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc, 12(3):275-285.
Huddleston R (1984) Introduction to the Grammar of English. Cambridge University Press, Cambridge, MA.
Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, Rindflesch TC (2006) Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. J Am American Society for Information Science and Technology, 57(1):96-113.
Iwanska LM, Shapiro SC (2000) Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language. AAAI Press, Menlo Park, CA.
Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: A review. IEEE Trans Pattern Analysis and Machine Intelligence, 22(1):4-37.
Jelinek F (1999) Statistical Methods for Speech Recognition. 2nd edition. MIT press, Cambridge, MA.
Johansson C (2000) A context sensitive maximum likelihood approach to chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 136-138.
Johnson DB, Chu WW, Dionisio JD, Taira RK, Kangarloo H (1999) Creating and indexing teaching files from free-text patient reports. Proc AMIA Symp, pp 814-818.
Johnson SB (1998) Conceptual graph grammar: A simple formalism for sublanguage. Methods Inf Med, 37(4-5):345-352.
Johnson SB (1999) A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6(3):205-218.
Joshi M, Pedersen MJT, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. Proc 21st National Conf Artificial Intelligence.
Jurafsky D, Martin JH (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.
Karlsson F (1990) Constraint grammar as a framework for parsing running text. Proc 13th Annual Conf Computational Linguistics, pp 168-173.
Kudo T, Matsumoto Y (2001) Chunking with support vector machines. Proc 2nd Meeting North American Chapter Assoc Computational Linguistics on Language Technologies, pp 192-199.
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc 18th Intl Conf Machine Learning, pp 282-289.
Le Moigno S, Charlet J, Bourigault D, Degoulet P, Jaulent MC (2002) Terminology extraction from text to build an ontology in surgical intensive care. Proc AMIA Symp, pp 430-434.
Lee DL, Chuang H, Seamons K (1997) Document ranking and the vector-space model. IEEE Software, 14(2):67-75.
Li L, Chase HS, Patel CO, Friedman C, Weng C (2008) Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: A case study. Proc AMIA Symp, pp 404-408.
Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med, 32(4):281-291.
Liu K, Chapman W, Hwa R, Crowley RS (2007) Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. J Am Med Inform Assoc, 14(5):641-650.
Lovis C, Michel PA, Baud R, Scherrer JR (1995) Word segmentation processing: A way to exponentially extend medical dictionaries. Proc MedInfo, vol 8 Pt 1, pp 28-32.
Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR (1991) The application of natural-language processing to healthcare quality assessment. Med Decis Making, 11(4 Suppl):S65-68.
Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Proc 7th Intl Conf Machine Learning, pp 591-598.
McCray AT, Bodenreider O, Malley JD, Browne AC (2001) Evaluating UMLS strings for natural language processing. Proc AMIA Symp, pp 448-452.
McDonald DD (1993) Internal and external evidence in the identification and semantic categorization of proper names. Acquisition of Lexical Knowledge from Text: Proc Workshop Sponsored by the Special Interest Group on the Lexicon of the ACL, pp 32-43.
McDonald DD (1996) Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev B, Pustejovsky J (eds) Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA, pp 21-39.
McRoy SW, Ali SS, Haller SM (1997) Uniform knowledge representation for language processing in the B2 system. Natural Language Engineering, 3(2):123-145.
Melton GB, Hripcsak G (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc, 12(4):448-457.
Meng H, Lam W, Low KF (1999) Learning belief networks for language understanding. Proc ASRU.
Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak, 5:30.
Meystre S, Haug PJ (2006) Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform, 39(6):589-599.
Mikheev A (2000) Tagging sentence boundaries. Proc 1st North American Chapter Assoc Computational Linguistics Conf, pp 264-271.
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: An on-line lexical database. Intl J Lexicography, 3(4):235-244.
Miller JE, Torii M, Vijay-Shanker K (2007) Adaptation of POS tagging for multiple biomedical domains. BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp 179-180.
Minsky ML, Papert S (1988) Perceptrons: An Introduction to Computational Geometry. Expanded edition. MIT Press, Cambridge, MA.
Molina A, Pla F (2002) Shallow parsing using specialized HMMs. J Machine Learning Research, 2(4):595-613.
Nadkarni P, Chen R, Brandt C (2001) UMLS concept indexing for production databases: A feasibility study. J Am Med Inform Assoc, 8(1):80-91.
Navigli R (2009) Word sense disambiguation: A survey. ACM Computing Surveys, 41(2):1-69.
Neamatullah I, Douglass MM, Lehman LWH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD (2008) Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(32):1-17.
Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD (1995) Identifying concepts in medical knowledge. Proc MedInfo, vol 8, pp 33-36.
Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc 24th Intl Conf Machine Learning, pp 681-688.
Pakhomov S, Pedersen T, Chute CG (2005) Abbreviation and acronym disambiguation in clinical discourse. Proc AMIA Symp, pp 589-593.
Pedersen MJT, Banerjee S, Patwardhan S (2005) Maximizing semantic relatedness to perform word sense disambiguation (Technical Report). University of Minnesota Supercomputing Institute.
Penz JF, Wilcox AB, Hurdle JF (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform, 40(2):174-182.
Pestian JP, Itert L, Duch W (2004) Development of a pediatric text-corpus for part-of-speech tagging. In: Wierzchon ST, Trojanowski K (eds) Intelligent Information Processing and the Web. Springer, pp 219-226.
Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large datasets. Proc 2001 Conf Empirical Methods in Natural Language Processing, pp 1–9.
Polackova G (2008) Understanding and use of phrasal verbs and idioms in medical/nursing texts. Bratisl Lek Listy, 109(11):531-532.
Pyper C, Amery J, Watson M, Crook C (2004) Patients' experiences when accessing their on-line electronic patient records in primary care. British Journal of General Practice, 54(498):38-43.
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.
Radiological Society of North America (2009) RadLex: A Lexicon for Uniform Indexing and Retrieval of Radiology Information Resources. http://www.rsna.org/radlex/ . Accessed April 14, 2009.
Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. Proc Conf Empirical Methods in Natural Language Processing, pp 133-142.
Ratnaparkhi A (1998) Maximum Entropy Models for Natural Language Ambiguity Resolution. Department of Computer and Information Science PhD dissertation. University of Pennsylvania.
Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO (1997) Maintaining the confidentiality of medical records shared over the Internet and the World Wide Web. Ann Intern Med, 127(2):138-141.
Roth D (1999) Memory based learning (Technical Report UIUCDCS-R-99-2125). Department of Computer Science, University of Illinois at Urbana-Champaign.
Ruch P, Baud R, Geissbuhler A (2003) Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1-2):169-184.
Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G (2000) Medical document anonymization with a semantic lexicon. Proc AMIA Symp, pp 729-733.
Ruppenhofer J, Ellsworth M, Petruck M, Johnson C (2005) FrameNet II: Extended Theory and Practice (Technical Report). ICSI, Berkeley, CA.
Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: Applications to patient data representation and automatic encoding. Methods Inf Med, 34(1-2):140-146.
Salton G (1988) Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.
Sang EFTK, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: Chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 127-132.
Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC, Chute CG (2008) Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform, 41(6):1088-1100.
Schulz S, Hahn U (2000) Morpheme-based, cross-lingual indexing for medical document retrieval. Int J Med Inform, 58-59:87-99.
Schulz S, Honeck M, Hahn U (2002) Biomedical text retrieval in languages with a complex morphology. Proc Workshop on NLP in the Biomedical Domain, pp 61-68.
Skut W, Brants T (1998) Chunk tagger: Statistical recognition of noun phrases. Proc ESSLLI-1998 Workshop on Automated Acquisition of Syntax and Parsing.
Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C (2005) Relations in biomedical ontologies. Genome Biol, 6(5):R46.
Smith L, Rindflesch T, Wilbur WJ (2004) MedPost: A part-of-speech tagger for biomedical text. Bioinformatics, 20(14):2320-2321.
Strzalkowski T (1999) Natural Language Information Retrieval. Kluwer Academic, Boston, MA.
Sweeney L (1996) Replacing personally-identifying information in medical records: The Scrub system. Proc AMIA Symp, pp 333-337.
Taira R, Bui AA, Hsu W, Bashyam V, Dube S, Watt E, Andrada L, El-Saden S, Cloughesy T, Kangarloo H (2008) A tool for improving the longitudinal imaging characterization for neuro-oncology cases. Proc AMIA Symp, pp 712-716.
Taira RK, Bui AA, Kangarloo H (2002) Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp, pp 757-761.
Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. Proc 40th Ann Meeting Assoc Computational Linguistics, Philadelphia, PA, pp 120-127.
Taskar B, Klein D, Collins M, Koller D, Manning C (2004) Max-margin parsing. Proc Empirical Methods in Natural Language Processing.
Tersmette S, Moore M (1988) Boundary word techniques for isolating multiword terminologies. Proc Ann Symp Computer Applications in Medical Care, pp 207-211.
Thede SM, Harper MP (1999) A second-order hidden Markov model for part-of-speech tagging. Proc 37th Annual Meeting ACL on Computational Linguistics, pp 175-182.
Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc 16th Intl Machine Learning Conf, Bled, Slovenia, pp 406-414.
Tjong EF, Sang K (2000) Noun phrase recognition by system combination. Proc 1st Meeting of the North American Chapter for the Association for Computational Linguistics, pp 50-55.
Tolentino HD, Matters MD, Walop W, Law B, Tong W, Liu F, Fontelo P, Kohl K, Payne DC (2007) A UMLS-based spell checker for natural language processing in vaccine safety. BMC Med Inform Decis Mak, 7:3.
Tomanek K, Wermter J, Hahn U (2007) A reappraisal of sentence and token splitting for life sciences documents. Stud Health Technol Inform, 129(Pt 1):524-528.
Trieschnigg D, Kraaij W, de Jong F (2007) The influence of basic tokenization on biomedical document retrieval. Proc 30th Ann Intl ACM SIGIR Conf Research and Development in Information Retrieval, pp 803-804.
Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550-563.
van den Bosch A, Buchholz S (2001) Shallow parsing on the basis of words only: A case study. Proc 40th Annual Meeting Assoc Computational Linguistics, pp 433-440.
Veenstra J, Van den Bosch A (2000) Single-classifier memory-based phrase chunking. Proc CoNLL, Lisbon, Portugal, pp 157-159.
Vilain M, Day D (2000) Phrase parsing with rule sequence processors: An application to the shared CoNLL task. Proc CoNLL-2000 and LLL-2000, pp 160-162.
Weeber M, Mork JG, Aronson AR (2001) Developing a test collection for biomedical word sense disambiguation. Proc AMIA Symp, pp 746-750.
Xiao J, Wang X, Liu B (2007) The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task. ACM Trans Asian Language Inforamtion Processing, 6(2):1-29.
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. Proc 33rd Annual Meeting Assoc Computational Linguistics, pp 189-196.
Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 9(3):262-272.
Zeng QT, Tse T (2006) Exploring and developing consumer health vocabularies. J Am Med Inform Assoc, 13(1):24-29.
Zhou GD, Su J, Tey TG (2000) Hybrid text chunking. Proc 2nd Workshop on Learning Language in Logic; 4th Conf Computational Natural Language Learning, vol 7, pp 163-165.
Zitouni I (2007) Backoff hierarchical class n-gram language models: Effectiveness to model unseen events in speech recognition. Computer Speech and Language, 21(1):88-104.
Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H (2003) IndexFinder: A method of extracting key concepts from clinical texts for indexing. Proc AMIA Symp, pp 763-767.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Taira, R.K. (2010). Natural Language Processing of Medical Reports. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0385-3_6
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0384-6
Online ISBN: 978-1-4419-0385-3
eBook Packages: EngineeringEngineering (R0)