Basic Building Blocks for Clinical Text Processing

  • Hercules Dalianis
Open Access


This chapter presents the basic building blocks for clinical text processing and relates them to the building blocks for standard text processing using natural languages processing techniques.


  1. Allen, J. F. (1984). Towards a general theory of action and time. Artificial Intelligence, 23(2), 123–154.CrossRefGoogle Scholar
  2. Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Granada, Spain (Vol. 1, pp. 563–566).Google Scholar
  3. Bejan, C. A., & Denny, J. C. (2014). Learning to identify treatment relations in clinical text. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 282). American Medical Informatics Association.Google Scholar
  4. Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA ’01 - 13th Nordic Conference on Computational Linguistics.Google Scholar
  5. Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.CrossRefGoogle Scholar
  6. Chapman, W. W., Hilert, D., Velupillai, S., Kvist, M., Skeppstedt, M., Chapman, B. E., et al. (2013). Extending the NegEx lexicon for multiple languages. Studies in Health Technology and Informatics, 192, 677.Google Scholar
  7. Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding (p. 29).Google Scholar
  8. Clark, A., Fox, C., & Lappin, S. (2013). The Handbook of Computational Linguistics and Natural Language Processing. New York: Wiley.Google Scholar
  9. Costumero, R., Lopez, F., Gonzalo-Martín, C., Millan, M., & Menasalvas, E. (2014). An approach to detect negation on medical documents in Spanish. In International Conference on Brain Informatics and Health (pp. 366–375). Berlin: Springer.Google Scholar
  10. Cotik, V., Roller, R., Xu, F., Uszkoreit, H., Budde, K., & Schmidt, D. (2016). Negation detection in clinical reports written in German. In the Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016), Held in Conjunction with Coling 2016 (pp. 115–124).Google Scholar
  11. Dalianis, H., & Velupillai, S. (2010a). How certain are clinical assessments? Annotating Swedish clinical text for (un) certainties, speculations and negations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010.Google Scholar
  12. Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.CrossRefGoogle Scholar
  13. de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., & Zhu, X. (2011). Machine-learned solutions for three stages of clinical information extraction: The state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5), 557–562.CrossRefGoogle Scholar
  14. Derczynski, L. R. A. (2017). Automatically Ordering Events and Times in Text. Berlin: Springer.CrossRefGoogle Scholar
  15. Dziadek, J. (2015). Improving SNOMED Mapping of Clinical Texts Using Context-Sensitive Spelling Correction. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.Google Scholar
  16. Dziadek, J., Henriksson, A., & Duneld, M. (2017). Improving terminology mapping in clinical text with context-sensitive spelling correction. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235, 241.Google Scholar
  17. Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. Accessed 11 Jan 2018.
  18. Grouin, C., Deléger, L., Rosier, A., Temal, L., Dameron, O., Van Hille, P., et al. (2011). Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment. In AMIA Annual Symposium Proceedings (pp. 501–510). American Medical Informatics Association.Google Scholar
  19. Hamon, T., & Grabar, N. (2014). Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL (pp. 101–105). Citeseer.Google Scholar
  20. Harkema, H., Dowling, J. N., Thornblade, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.CrossRefGoogle Scholar
  21. Hassel, M., Henriksson, A., & Velupillai, S. (2011). Something old, something new: Applying a pre-trained parsing model to clinical Swedish. In Northern European Association for Language Technology (NEALT).Google Scholar
  22. He, T. Y. (2007). Coreference Resolution on Entities and Events for Hospital Discharge Summaries. Master’s thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology.Google Scholar
  23. Henriksson, A., Kvist, M., Dalianis, H., & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of Biomedical Informatics, 57, 333–349.CrossRefGoogle Scholar
  24. Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.CrossRefGoogle Scholar
  25. Huang, Y., & Lowe, H. J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3), 304.CrossRefGoogle Scholar
  26. Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.Google Scholar
  27. Jung, H., Allen, J., Blaylock, N., De Beaumont, W., Galescu, L., & Swift, M. (2011). Building timelines from narrative clinical records: Initial results based-on deep natural language understanding. In Proceedings of BioNLP 2011 Workshop (pp. 146–154). Association for Computational Linguistics.Google Scholar
  28. Jurafsky, D., & Martin, J. H. (2014). Speech and Language Processing. Pearson London.Google Scholar
  29. Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.CrossRefGoogle Scholar
  30. Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.Google Scholar
  31. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.Google Scholar
  32. Luo, Y., Uzuner, Ö., & Szolovits, P. (2016). Bridging semantics and syntax with graph algorithms–state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics, 18(1), 160–178.CrossRefGoogle Scholar
  33. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 35, 128–144.Google Scholar
  34. Mitkov, R. (2014). Anaphora Resolution: The State of the Art. Routledge.Google Scholar
  35. Mitkov, R. (2005). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.Google Scholar
  36. Morante, R., & Daelemans, W. (2009). A metalearning approach to processing the scope of negation. In CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 21–29). Association for Computational Linguistics. ISBN 978-1-932432-29-9.Google Scholar
  37. Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L.-M., Salanterä, S., et al. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7(1), 43.Google Scholar
  38. Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents. Journal of the American Medical Informatics Association, 8(6), 598–609.CrossRefGoogle Scholar
  39. Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., et al. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1659–1666). [].
  40. Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006 (pp. 2216–2219). Accessed 11 Jan 2018.
  41. Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).Google Scholar
  42. Pirinen, T., & Lindén, K. (2010). Creating and weighting hunspell dictionaries as finite-state automata. Investigationes Linguisticae, 21, 1–16.CrossRefGoogle Scholar
  43. Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., et al. (2003). TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3, 28–34.Google Scholar
  44. Roberts, A., Gaizauskas, R., Hepple, M., & Guo, Y. (2008). Mining clinical relationships from patient narratives. BMC Bioinformatics, 9(11), 1.CrossRefGoogle Scholar
  45. Rokach, L., Romano, R., & Maimo, O. (2008). Negation recognition in medical narrative reports. Information Retrieval Journal, 11(6), 499–538.CrossRefGoogle Scholar
  46. Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.CrossRefGoogle Scholar
  47. Siklósi, B., Novák, A., & Prószéky, G. (2016). Context-aware correction of spelling errors in Hungarian medical documents. Computer Speech & Language, 35, 219–233.CrossRefGoogle Scholar
  48. Skeppstedt, M. (2011). Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics, 2(Suppl 3), S3.CrossRefGoogle Scholar
  49. Skeppstedt, M. (2015). Extracting Clinical Findings from Swedish Health Record Text. PhD thesis, Department of Computer and Systems Sciences, Stockholm University.Google Scholar
  50. Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.CrossRefGoogle Scholar
  51. South, B. R., Shen, S., Jones, M., Garvin, J., Samore, M. H., Chapman, W. W., et al. (2009). Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinformatics, 10(9), S12.CrossRefGoogle Scholar
  52. Strötgen, J., & Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 321–324). Association for Computational Linguistics.Google Scholar
  53. Styler IV, W., Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P., et al. (2014). Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2, 143–154. Accessed 11 Jan 2018. ISSN 2307-387X.
  54. Sun, W., Rumshisky, A., & Uzuner, Ö. (2013a). Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5), 806–813.CrossRefGoogle Scholar
  55. Sun, W., Rumshisky, A., & Uzuner, Ö. (2013b). Temporal reasoning over clinical text: The state of the art. Journal of the American Medical Informatics Association, 20(5), 814–819.CrossRefGoogle Scholar
  56. Szarvas, G. (2008). Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proceedings of ACL-08: HLT, Columbus, Ohio, June 2008 (pp. 281–289). Association for Computational Linguistics.
  57. Tengstrand, L., Megyesi, B., Henriksson, A., Duneld, M., & Kvist, M. (2014). EACL – Expansion of abbreviations in clinical text. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) (pp. 94–103). Association for Computational Linguistics.Google Scholar
  58. Tissot, H. C. (2016). Normalisation of Imprecise Temporal Expressions Extracted from Text. PhD thesis, Computer Science at the Federal University of Paraná.Google Scholar
  59. Uzuner, Ö., Bodnari, A., Shen, S., Forbush, T., Pestian, J., & South, B. R. (2012). Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5), 786–791.CrossRefGoogle Scholar
  60. Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5), 514–518.CrossRefGoogle Scholar
  61. Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.CrossRefGoogle Scholar
  62. Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.Google Scholar
  63. Velupillai, S. (2014). Temporal expressions in Swedish medical text – A pilot study. In Proceedings of BioNLP 2014, Baltimore, MD, June 2014 (pp. 88–92). Association for Computational Linguistics. Accessed 11 Jan 2018.
  64. Velupillai, S., Mowery, D., South, B. R., Kvist, M., & Dalianis, H. (2015). Recent advances in clinical natural language processing in support of semantic analysis. Yearbook of Medical Informatics, 10(1), 183.Google Scholar
  65. Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B. E., Dalianis, H., et al. (2014). Cue-based assertion classification for Swedish clinical text–Developing a lexicon for pyConTextSwe. Artificial Intelligence in Medicine, 61(3), 137–144.CrossRefGoogle Scholar
  66. Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.CrossRefGoogle Scholar
  67. Wong, W., & Glance, D. (2011). Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine, 53, 171–180.CrossRefGoogle Scholar
  68. Wong, W., Liu, W., & Bennamoun, M. (2006). Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In Proceedings of the Fifth Australasian conference on Data Mining and Analytics (Vol. 61, pp. 83–89). Australian Computer Society, Inc.Google Scholar
  69. Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.Google Scholar
  70. Xu, H., Stetson, P. D., & Friedman, C. (2007). A study of abbreviations in clinical notes. In AMIA Annual Symposium Proceedings (Vol. 2007, p. 821). American Medical Informatics Association.Google Scholar
  71. Zeng, Q. T., Redd, D., Rindflesch, T. C., & Nebeker, J. R. (2012). Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA Annual Symposium Proceedings.Google Scholar
  72. Zhou, L., Friedman, C., Parsons, S., & Hripcsak, G. (2005). System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. In AMIA Annual Symposium Proceedings (pp. 869–873).Google Scholar
  73. Zhou, L., & Hripcsak, G. (2007). Temporal reasoning with medical data–A review with emphasis on medical natural language processing. Journal of Biomedical Informatics, 40(2), 183–202.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Hercules Dalianis
    • 1
  1. 1.DSV-Stockholm UniversityKistaSweden

Personalised recommendations