Skip to main content

Advertisement

SpringerLink for Corporate & Health
Book cover

Clinical Text Mining pp 55–82Cite as

Basic Building Blocks for Clinical Text Processing

Basic Building Blocks for Clinical Text Processing

  • Hercules Dalianis2 
  • Chapter
  • Open Access
  • First Online: 15 May 2018
  • 17k Accesses

  • 1 Citations

Abstract

This chapter presents the basic building blocks for clinical text processing and relates them to the building blocks for standard text processing using natural languages processing techniques.

Download chapter PDF

References

  • Allen, J. F. (1984). Towards a general theory of action and time. Artificial Intelligence, 23(2), 123–154.

    CrossRef  Google Scholar 

  • Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Granada, Spain (Vol. 1, pp. 563–566).

    Google Scholar 

  • Bejan, C. A., & Denny, J. C. (2014). Learning to identify treatment relations in clinical text. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 282). American Medical Informatics Association.

    Google Scholar 

  • Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA ’01 - 13th Nordic Conference on Computational Linguistics.

    Google Scholar 

  • Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.

    CrossRef  Google Scholar 

  • Chapman, W. W., Hilert, D., Velupillai, S., Kvist, M., Skeppstedt, M., Chapman, B. E., et al. (2013). Extending the NegEx lexicon for multiple languages. Studies in Health Technology and Informatics, 192, 677.

    Google Scholar 

  • Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding (p. 29).

    Google Scholar 

  • Clark, A., Fox, C., & Lappin, S. (2013). The Handbook of Computational Linguistics and Natural Language Processing. New York: Wiley.

    Google Scholar 

  • Costumero, R., Lopez, F., Gonzalo-Martín, C., Millan, M., & Menasalvas, E. (2014). An approach to detect negation on medical documents in Spanish. In International Conference on Brain Informatics and Health (pp. 366–375). Berlin: Springer.

    Google Scholar 

  • Cotik, V., Roller, R., Xu, F., Uszkoreit, H., Budde, K., & Schmidt, D. (2016). Negation detection in clinical reports written in German. In the Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016), Held in Conjunction with Coling 2016 (pp. 115–124).

    Google Scholar 

  • Dalianis, H., & Velupillai, S. (2010a). How certain are clinical assessments? Annotating Swedish clinical text for (un) certainties, speculations and negations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010.

    Google Scholar 

  • Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.

    CrossRef  Google Scholar 

  • de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., & Zhu, X. (2011). Machine-learned solutions for three stages of clinical information extraction: The state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5), 557–562.

    CrossRef  Google Scholar 

  • Derczynski, L. R. A. (2017). Automatically Ordering Events and Times in Text. Berlin: Springer.

    CrossRef  Google Scholar 

  • Dziadek, J. (2015). Improving SNOMED Mapping of Clinical Texts Using Context-Sensitive Spelling Correction. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.

    Google Scholar 

  • Dziadek, J., Henriksson, A., & Duneld, M. (2017). Improving terminology mapping in clinical text with context-sensitive spelling correction. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235, 241.

    Google Scholar 

  • Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-1209. Accessed 11 Jan 2018.

  • Grouin, C., Deléger, L., Rosier, A., Temal, L., Dameron, O., Van Hille, P., et al. (2011). Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment. In AMIA Annual Symposium Proceedings (pp. 501–510). American Medical Informatics Association.

    Google Scholar 

  • Hamon, T., & Grabar, N. (2014). Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL (pp. 101–105). Citeseer.

    Google Scholar 

  • Harkema, H., Dowling, J. N., Thornblade, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.

    CrossRef  Google Scholar 

  • Hassel, M., Henriksson, A., & Velupillai, S. (2011). Something old, something new: Applying a pre-trained parsing model to clinical Swedish. In Northern European Association for Language Technology (NEALT).

    Google Scholar 

  • He, T. Y. (2007). Coreference Resolution on Entities and Events for Hospital Discharge Summaries. Master’s thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology.

    Google Scholar 

  • Henriksson, A., Kvist, M., Dalianis, H., & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of Biomedical Informatics, 57, 333–349.

    CrossRef  Google Scholar 

  • Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.

    CrossRef  Google Scholar 

  • Huang, Y., & Lowe, H. J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3), 304.

    CrossRef  Google Scholar 

  • Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.

    Google Scholar 

  • Jung, H., Allen, J., Blaylock, N., De Beaumont, W., Galescu, L., & Swift, M. (2011). Building timelines from narrative clinical records: Initial results based-on deep natural language understanding. In Proceedings of BioNLP 2011 Workshop (pp. 146–154). Association for Computational Linguistics.

    Google Scholar 

  • Jurafsky, D., & Martin, J. H. (2014). Speech and Language Processing. Pearson London.

    Google Scholar 

  • Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.

    CrossRef  Google Scholar 

  • Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.

    Google Scholar 

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.

    Google Scholar 

  • Luo, Y., Uzuner, Ö., & Szolovits, P. (2016). Bridging semantics and syntax with graph algorithms–state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics, 18(1), 160–178.

    CrossRef  Google Scholar 

  • Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 35, 128–144.

    Google Scholar 

  • Mitkov, R. (2014). Anaphora Resolution: The State of the Art. Routledge.

    Google Scholar 

  • Mitkov, R. (2005). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.

    Google Scholar 

  • Morante, R., & Daelemans, W. (2009). A metalearning approach to processing the scope of negation. In CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 21–29). Association for Computational Linguistics. ISBN 978-1-932432-29-9.

    Google Scholar 

  • Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L.-M., Salanterä, S., et al. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7(1), 43.

    Google Scholar 

  • Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents. Journal of the American Medical Informatics Association, 8(6), 598–609.

    CrossRef  Google Scholar 

  • Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., et al. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1659–1666). http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf [www.lrec-conf.org].

  • Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006 (pp. 2216–2219). http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf. Accessed 11 Jan 2018.

  • Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).

    Google Scholar 

  • Pirinen, T., & Lindén, K. (2010). Creating and weighting hunspell dictionaries as finite-state automata. Investigationes Linguisticae, 21, 1–16.

    CrossRef  Google Scholar 

  • Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., et al. (2003). TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3, 28–34.

    Google Scholar 

  • Roberts, A., Gaizauskas, R., Hepple, M., & Guo, Y. (2008). Mining clinical relationships from patient narratives. BMC Bioinformatics, 9(11), 1.

    CrossRef  Google Scholar 

  • Rokach, L., Romano, R., & Maimo, O. (2008). Negation recognition in medical narrative reports. Information Retrieval Journal, 11(6), 499–538.

    CrossRef  Google Scholar 

  • Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.

    CrossRef  Google Scholar 

  • Siklósi, B., Novák, A., & Prószéky, G. (2016). Context-aware correction of spelling errors in Hungarian medical documents. Computer Speech & Language, 35, 219–233.

    CrossRef  Google Scholar 

  • Skeppstedt, M. (2011). Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics, 2(Suppl 3), S3.

    CrossRef  Google Scholar 

  • Skeppstedt, M. (2015). Extracting Clinical Findings from Swedish Health Record Text. PhD thesis, Department of Computer and Systems Sciences, Stockholm University.

    Google Scholar 

  • Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.

    CrossRef  Google Scholar 

  • South, B. R., Shen, S., Jones, M., Garvin, J., Samore, M. H., Chapman, W. W., et al. (2009). Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinformatics, 10(9), S12.

    CrossRef  Google Scholar 

  • Strötgen, J., & Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 321–324). Association for Computational Linguistics.

    Google Scholar 

  • Styler IV, W., Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P., et al. (2014). Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2, 143–154. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305. Accessed 11 Jan 2018. ISSN 2307-387X.

  • Sun, W., Rumshisky, A., & Uzuner, Ö. (2013a). Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5), 806–813.

    CrossRef  Google Scholar 

  • Sun, W., Rumshisky, A., & Uzuner, Ö. (2013b). Temporal reasoning over clinical text: The state of the art. Journal of the American Medical Informatics Association, 20(5), 814–819.

    CrossRef  Google Scholar 

  • Szarvas, G. (2008). Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proceedings of ACL-08: HLT, Columbus, Ohio, June 2008 (pp. 281–289). Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1033.

  • Tengstrand, L., Megyesi, B., Henriksson, A., Duneld, M., & Kvist, M. (2014). EACL – Expansion of abbreviations in clinical text. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) (pp. 94–103). Association for Computational Linguistics.

    Google Scholar 

  • Tissot, H. C. (2016). Normalisation of Imprecise Temporal Expressions Extracted from Text. PhD thesis, Computer Science at the Federal University of Paraná.

    Google Scholar 

  • Uzuner, Ö., Bodnari, A., Shen, S., Forbush, T., Pestian, J., & South, B. R. (2012). Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5), 786–791.

    CrossRef  Google Scholar 

  • Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5), 514–518.

    CrossRef  Google Scholar 

  • Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.

    CrossRef  Google Scholar 

  • Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.

    Google Scholar 

  • Velupillai, S. (2014). Temporal expressions in Swedish medical text – A pilot study. In Proceedings of BioNLP 2014, Baltimore, MD, June 2014 (pp. 88–92). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-3413. Accessed 11 Jan 2018.

  • Velupillai, S., Mowery, D., South, B. R., Kvist, M., & Dalianis, H. (2015). Recent advances in clinical natural language processing in support of semantic analysis. Yearbook of Medical Informatics, 10(1), 183.

    Google Scholar 

  • Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B. E., Dalianis, H., et al. (2014). Cue-based assertion classification for Swedish clinical text–Developing a lexicon for pyConTextSwe. Artificial Intelligence in Medicine, 61(3), 137–144.

    CrossRef  Google Scholar 

  • Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.

    CrossRef  Google Scholar 

  • Wong, W., & Glance, D. (2011). Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine, 53, 171–180.

    CrossRef  Google Scholar 

  • Wong, W., Liu, W., & Bennamoun, M. (2006). Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In Proceedings of the Fifth Australasian conference on Data Mining and Analytics (Vol. 61, pp. 83–89). Australian Computer Society, Inc.

    Google Scholar 

  • Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.

    Google Scholar 

  • Xu, H., Stetson, P. D., & Friedman, C. (2007). A study of abbreviations in clinical notes. In AMIA Annual Symposium Proceedings (Vol. 2007, p. 821). American Medical Informatics Association.

    Google Scholar 

  • Zeng, Q. T., Redd, D., Rindflesch, T. C., & Nebeker, J. R. (2012). Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA Annual Symposium Proceedings.

    Google Scholar 

  • Zhou, L., Friedman, C., Parsons, S., & Hripcsak, G. (2005). System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. In AMIA Annual Symposium Proceedings (pp. 869–873).

    Google Scholar 

  • Zhou, L., & Hripcsak, G. (2007). Temporal reasoning with medical data–A review with emphasis on medical natural language processing. Journal of Biomedical Informatics, 40(2), 183–202.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. DSV-Stockholm University, Kista, Sweden

    Hercules Dalianis

Authors
  1. Hercules Dalianis
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

This chapter is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

Copyright information

© 2018 The Author(s)

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Dalianis, H. (2018). Basic Building Blocks for Clinical Text Processing. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_7

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-319-78503-5_7

  • Published: 15 May 2018

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78502-8

  • Online ISBN: 978-3-319-78503-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.