Basic Building Blocks for Clinical Text Processing

Dalianis, Hercules

doi:10.1007/978-3-319-78503-5_7

Hercules Dalianis²

21k Accesses
1 Citations

Abstract

This chapter presents the basic building blocks for clinical text processing and relates them to the building blocks for standard text processing using natural languages processing techniques.

Download to read the full chapter text

Chapter PDF

References

Allen, J. F. (1984). Towards a general theory of action and time. Artificial Intelligence, 23(2), 123–154.
Article Google Scholar
Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Granada, Spain (Vol. 1, pp. 563–566).
Google Scholar
Bejan, C. A., & Denny, J. C. (2014). Learning to identify treatment relations in clinical text. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 282). American Medical Informatics Association.
Google Scholar
Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA ’01 - 13th Nordic Conference on Computational Linguistics.
Google Scholar
Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.
Article Google Scholar
Chapman, W. W., Hilert, D., Velupillai, S., Kvist, M., Skeppstedt, M., Chapman, B. E., et al. (2013). Extending the NegEx lexicon for multiple languages. Studies in Health Technology and Informatics, 192, 677.
Google Scholar
Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding (p. 29).
Google Scholar
Clark, A., Fox, C., & Lappin, S. (2013). The Handbook of Computational Linguistics and Natural Language Processing. New York: Wiley.
Google Scholar
Costumero, R., Lopez, F., Gonzalo-Martín, C., Millan, M., & Menasalvas, E. (2014). An approach to detect negation on medical documents in Spanish. In International Conference on Brain Informatics and Health (pp. 366–375). Berlin: Springer.
Google Scholar
Cotik, V., Roller, R., Xu, F., Uszkoreit, H., Budde, K., & Schmidt, D. (2016). Negation detection in clinical reports written in German. In the Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016), Held in Conjunction with Coling 2016 (pp. 115–124).
Google Scholar
Dalianis, H., & Velupillai, S. (2010a). How certain are clinical assessments? Annotating Swedish clinical text for (un) certainties, speculations and negations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010.
Google Scholar
Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.
Article Google Scholar
de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., & Zhu, X. (2011). Machine-learned solutions for three stages of clinical information extraction: The state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5), 557–562.
Article Google Scholar
Derczynski, L. R. A. (2017). Automatically Ordering Events and Times in Text. Berlin: Springer.
Book Google Scholar
Dziadek, J. (2015). Improving SNOMED Mapping of Clinical Texts Using Context-Sensitive Spelling Correction. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.
Google Scholar
Dziadek, J., Henriksson, A., & Duneld, M. (2017). Improving terminology mapping in clinical text with context-sensitive spelling correction. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235, 241.
Google Scholar
Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-1209. Accessed 11 Jan 2018.
Grouin, C., Deléger, L., Rosier, A., Temal, L., Dameron, O., Van Hille, P., et al. (2011). Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment. In AMIA Annual Symposium Proceedings (pp. 501–510). American Medical Informatics Association.
Google Scholar
Hamon, T., & Grabar, N. (2014). Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL (pp. 101–105). Citeseer.
Google Scholar
Harkema, H., Dowling, J. N., Thornblade, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.
Article Google Scholar
Hassel, M., Henriksson, A., & Velupillai, S. (2011). Something old, something new: Applying a pre-trained parsing model to clinical Swedish. In Northern European Association for Language Technology (NEALT).
Google Scholar
He, T. Y. (2007). Coreference Resolution on Entities and Events for Hospital Discharge Summaries. Master’s thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
Google Scholar
Henriksson, A., Kvist, M., Dalianis, H., & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of Biomedical Informatics, 57, 333–349.
Article Google Scholar
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.
Article Google Scholar
Huang, Y., & Lowe, H. J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3), 304.
Article Google Scholar
Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.
Google Scholar
Jung, H., Allen, J., Blaylock, N., De Beaumont, W., Galescu, L., & Swift, M. (2011). Building timelines from narrative clinical records: Initial results based-on deep natural language understanding. In Proceedings of BioNLP 2011 Workshop (pp. 146–154). Association for Computational Linguistics.
Google Scholar
Jurafsky, D., & Martin, J. H. (2014). Speech and Language Processing. Pearson London.
Google Scholar
Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.
Article Google Scholar
Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.
Google Scholar
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.
Google Scholar
Luo, Y., Uzuner, Ö., & Szolovits, P. (2016). Bridging semantics and syntax with graph algorithms–state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics, 18(1), 160–178.
Article Google Scholar
Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 35, 128–144.
Google Scholar
Mitkov, R. (2014). Anaphora Resolution: The State of the Art. Routledge.
Google Scholar
Mitkov, R. (2005). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.
Google Scholar
Morante, R., & Daelemans, W. (2009). A metalearning approach to processing the scope of negation. In CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 21–29). Association for Computational Linguistics. ISBN 978-1-932432-29-9.
Google Scholar
Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L.-M., Salanterä, S., et al. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7(1), 43.
Google Scholar
Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents. Journal of the American Medical Informatics Association, 8(6), 598–609.
Article Google Scholar
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., et al. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1659–1666). http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf [www.lrec-conf.org].
Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006 (pp. 2216–2219). http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf. Accessed 11 Jan 2018.
Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).
Google Scholar
Pirinen, T., & Lindén, K. (2010). Creating and weighting hunspell dictionaries as finite-state automata. Investigationes Linguisticae, 21, 1–16.
Article Google Scholar
Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., et al. (2003). TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3, 28–34.
Google Scholar
Roberts, A., Gaizauskas, R., Hepple, M., & Guo, Y. (2008). Mining clinical relationships from patient narratives. BMC Bioinformatics, 9(11), 1.
Article Google Scholar
Rokach, L., Romano, R., & Maimo, O. (2008). Negation recognition in medical narrative reports. Information Retrieval Journal, 11(6), 499–538.
Article Google Scholar
Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.
Article Google Scholar
Siklósi, B., Novák, A., & Prószéky, G. (2016). Context-aware correction of spelling errors in Hungarian medical documents. Computer Speech & Language, 35, 219–233.
Article Google Scholar
Skeppstedt, M. (2011). Negation detection in Swedish clinical text: An adaption of NegEx to Swedish. Journal of Biomedical Semantics, 2(Suppl 3), S3.
Article Google Scholar
Skeppstedt, M. (2015). Extracting Clinical Findings from Swedish Health Record Text. PhD thesis, Department of Computer and Systems Sciences, Stockholm University.
Google Scholar
Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.
Article Google Scholar
South, B. R., Shen, S., Jones, M., Garvin, J., Samore, M. H., Chapman, W. W., et al. (2009). Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinformatics, 10(9), S12.
Article Google Scholar
Strötgen, J., & Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 321–324). Association for Computational Linguistics.
Google Scholar
Styler IV, W., Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P., et al. (2014). Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2, 143–154. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305. Accessed 11 Jan 2018. ISSN 2307-387X.
Sun, W., Rumshisky, A., & Uzuner, Ö. (2013a). Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5), 806–813.
Article Google Scholar
Sun, W., Rumshisky, A., & Uzuner, Ö. (2013b). Temporal reasoning over clinical text: The state of the art. Journal of the American Medical Informatics Association, 20(5), 814–819.
Article Google Scholar
Szarvas, G. (2008). Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proceedings of ACL-08: HLT, Columbus, Ohio, June 2008 (pp. 281–289). Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1033.
Tengstrand, L., Megyesi, B., Henriksson, A., Duneld, M., & Kvist, M. (2014). EACL – Expansion of abbreviations in clinical text. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) (pp. 94–103). Association for Computational Linguistics.
Google Scholar
Tissot, H. C. (2016). Normalisation of Imprecise Temporal Expressions Extracted from Text. PhD thesis, Computer Science at the Federal University of Paraná.
Google Scholar
Uzuner, Ö., Bodnari, A., Shen, S., Forbush, T., Pestian, J., & South, B. R. (2012). Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5), 786–791.
Article Google Scholar
Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5), 514–518.
Article Google Scholar
Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.
Article Google Scholar
Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.
Google Scholar
Velupillai, S. (2014). Temporal expressions in Swedish medical text – A pilot study. In Proceedings of BioNLP 2014, Baltimore, MD, June 2014 (pp. 88–92). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-3413. Accessed 11 Jan 2018.
Velupillai, S., Mowery, D., South, B. R., Kvist, M., & Dalianis, H. (2015). Recent advances in clinical natural language processing in support of semantic analysis. Yearbook of Medical Informatics, 10(1), 183.
Google Scholar
Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B. E., Dalianis, H., et al. (2014). Cue-based assertion classification for Swedish clinical text–Developing a lexicon for pyConTextSwe. Artificial Intelligence in Medicine, 61(3), 137–144.
Article Google Scholar
Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.
Article Google Scholar
Wong, W., & Glance, D. (2011). Statistical semantic and clinician confidence analysis for real-time clinical progress note cleaning. Artificial Intelligence in Medicine, 53, 171–180.
Article Google Scholar
Wong, W., Liu, W., & Bennamoun, M. (2006). Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In Proceedings of the Fifth Australasian conference on Data Mining and Analytics (Vol. 61, pp. 83–89). Australian Computer Society, Inc.
Google Scholar
Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.
Google Scholar
Xu, H., Stetson, P. D., & Friedman, C. (2007). A study of abbreviations in clinical notes. In AMIA Annual Symposium Proceedings (Vol. 2007, p. 821). American Medical Informatics Association.
Google Scholar
Zeng, Q. T., Redd, D., Rindflesch, T. C., & Nebeker, J. R. (2012). Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA Annual Symposium Proceedings.
Google Scholar
Zhou, L., Friedman, C., Parsons, S., & Hripcsak, G. (2005). System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. In AMIA Annual Symposium Proceedings (pp. 869–873).
Google Scholar
Zhou, L., & Hripcsak, G. (2007). Temporal reasoning with medical data–A review with emphasis on medical natural language processing. Journal of Biomedical Informatics, 40(2), 183–202.
Article Google Scholar

Download references

Author information

Authors and Affiliations

DSV-Stockholm University, Kista, Sweden
Hercules Dalianis

Authors

Hercules Dalianis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dalianis, H. (2018). Basic Building Blocks for Clinical Text Processing. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-78503-5_7
Published: 15 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78502-8
Online ISBN: 978-3-319-78503-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics