Characteristics of Patient Records and Clinical Corpora

Dalianis, Hercules

doi:10.1007/978-3-319-78503-5_4

Hercules Dalianis²

21k Accesses
4 Citations

Abstract

This chapter specifically details the linguistic characteristics of patient record text in the form of spelling errors, domain specific abbreviations, negation and assertion expressions, etc. for English, Swedish and other languages.

Download to read the full chapter text

Chapter PDF

References

Afzal, Z., Pons, E., Kang, N., Sturkenboom, M. C. J. M., Schuemie, M. J., & Kors, J. A. (2014). ContextD: An algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinformatics, 15(1), 373.
Google Scholar
Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., et al. (2011). Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics, 2(Suppl 3), 1–11.
Article Google Scholar
Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Masuichi, H., Waki, K., et al. (2010). Extraction of adverse drug effects from clinical records. Studies in Health Technology and Informatics, 160(Pt 1), 739–743.
Google Scholar
Asamura, H., Wittekind, C., & Sobin, L. H. (2014). TNM Atlas: Illustrated Guide to the TNM Classification of Malignant Tumours. New York: Wiley.
Google Scholar
Attardi, G., Cozza, V., & Sartiano, D. (2015). Annotation and extraction of relations from Italian medical records. In Proceedings of the 6th Italian Information Retrieval Workshop, Cagliari, Italy.
Google Scholar
Boytcheva, S., Angelova, G., Angelov, Z., & Tcharaktchiev, D. (2015). Text mining and big data analytics for retrospective analysis of clinical texts from outpatient care. Cybernetics and Information Technologies, 15(4), 58–77.
Google Scholar
Boytcheva, S., Nikolova, I., Angelova, G., & Angelov, Z. (2017b). Identification of risk factors in clinical texts through association rules. In Proceedings of RANLP Workshop on Biomedical Natural Language Processing (pp. 64–72).
Google Scholar
Cederblom, S. (2005). Medicinska förkortningar och akronymer. Studentlitteratur, Lund.
Google Scholar
Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.
Article Google Scholar
Chazard, E., Ficheur, G., Bernonville, S., Luyckx, M., & Beuscart, R. (2011). Data mining to generate adverse drug events detection rules. IEEE Transactions on Information Technology in Biomedicine, 15(6), 823–830.
Article Google Scholar
Cotik, V., Filippo, D., Uszkoreit, H., & Xu, F. (2017). Annotation of entities and relations in Spanish radiology reports. In Proceedings of Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria (pp. 177–184).
Google Scholar
Dalianis, H. (2014). Clinical text retrieval - An overview of basic building blocks and applications. In Professional Search in the Modern World (pp. 147–165). Berlin: Springer.
Google Scholar
Dalianis, H., Hassel, M., & Velupillai, S. (2009). The Stockholm EPR Corpus-characteristics and some initial findings. In Proceedings of ISHIMR 2009, Evaluation and Implementation of e-Health and Health Information Initiatives: International Perspectives. 14th International Symposium for Health Information Management Research (pp. 243–249).
Google Scholar
Dalianis, H., Henriksson, A., Kvist, M., Velupillai, S., & Weegar, R. (2015). HEALTH BANK–A workbench for data science applications in healthcare. In J. Krogstie, G. Juel-Skielse, & V. Kabilan (Eds.), Proceedings of the CAiSE-2015 Industry Track Co-located with 27th Conference on Advanced Information Systems Engineering (CAiSE 2015), Stockholm, Sweden, June 11, 2015, CEUR (Vol. 1381, pp. 1–18). https://doi.org/urn:nbn:de:0074-1381-0E.
Dalianis, H., & Skeppstedt, M. (2010). Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (pp. 5–13). Association for Computational Linguistics.
Google Scholar
Ehrentraut, C., Tanushi, H., Tiedemann, J., & Dalianis, H. (2012). Detection of hospital acquired infections in sparse and noisy Swedish patient records. In Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) Held in Conjunction with Coling 2012, Bombay. ACM Digital Library.
Google Scholar
Eriksson, R., Jensen, P. B., Frankild, S., Jensen, L. J., & Brunak, S. (2013). Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text. Journal of the American Medical Informatics Association, 20(5), 947–953.
Article Google Scholar
Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-1209. Accessed 11 Jan 2018.
Groopman, J. E. (2007). How Doctors Think. New York: Houghton Mifflin Company.
Google Scholar
Grouin, C., & Névéol, A. (2014). De-identification of clinical notes in French: Towards a protocol for reference corpus development. Journal of Biomedical Informatics, 50, 151–161.
Article Google Scholar
Isenius, N. (2012). Abbreviation Detection in Swedish Medical Records. The Development of SCAN, A Swedish Clinical Abbreviation Normalizer. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.
Google Scholar
Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.
Google Scholar
Jensen, K., Soguero-Ruiz, C., Mikalsen, K. O., Lindsetmo, R.-O., Kouskoumvekaki, I., Girolami, M., et al. (2017). Analysis of free text in electronic health records for identification of cancer patient trajectories. Scientific Reports, 7, 46226.
Article Google Scholar
Koeling, R., Carroll, J., Tate, A. R., & Nicholson, A. (2011). Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In Proceedings of the 3rd Louhi Workshop on Text and Data Mining of Health Documents (pp. 43–50).
Google Scholar
Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.
Google Scholar
Lewis, J. D., Schinnar, R., Bilker, W. B., Wang, X., & Strom, B. L. (2007). Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug safety, 16(4), 393–401.
Article Google Scholar
Liu, H., Lussier, Y. A., & Friedman, C. (2001). A study of abbreviations in the UMLS. In AMIA Annual Symposium Proceedings (p. 393). American Medical Informatics Association.
Google Scholar
Lövestam, E., Velupillai, S., & Kvist, M. (2014). Abbreviations in Swedish clinical text - Use by three professions. Studies in Health Technology and Informatics, 205, 720–724. https://doi.org/10.3233/978-1-61499-432-9-720.
Google Scholar
Marciniak, M., & Mykowiecka, A. (2014). Terminology extraction from medical texts in Polish. Journal of Biomedical Semantics, 5(1), 24.
Article Google Scholar
Névéol, A., Dalianis, H., Savova, G., & Zweigenbaum, P. (2018). Clinical natural language processing in languages other than english: opportunities and challenges. Journal of Biomedical Semantics, 9(12), 1–13.
Google Scholar
Nguyen, A. N., Moore, J., O’Dwyer, J., & Philpot, S. (2016). Automated cancer registry notifications: validation of a medical text analytics system for identifying patients with cancer from a state-wide pathology repository. In AMIA Annual Symposium Proceedings (pp. 964–973). American Medical Informatics Association.
Google Scholar
Nizamuddin, N., & Dalianis, H. (2014). Detection of spelling errors in Swedish clinical text. In 1st Nordic Workshop on Evaluation of Spellchecking and Proofing Tools (NorWEST2014), SLTC 2014.
Google Scholar
Olsson, M. (2011). Vem begriper patientjournalen? (In Swedish). Bachelor’s thesis, Linnaeus University.
Google Scholar
Pakhomov, S., Pedersen, T., & Chute, C. G. (2005). Abbreviation and acronym disambiguation in clinical discourse. In AMIA Annual Symposium Proceedings (Vol. 2005, p. 589). American Medical Informatics Association.
Google Scholar
Pantazos, K., Lauesen, S., & Lippert, S. (2016). Preserving medical correctness, readability and consistency in de-identified health records. Health Informatics Journal, 23(4), 291–303.
Article Google Scholar
Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).
Google Scholar
Perera, G., Broadbent, M., Callard, F., Chang, C.-K., Downs, J., Dutta, R., et al. (2016). Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an electronic mental health record-derived data resource. BMJ Open, 6(3), e008721.
Article Google Scholar
Pérez, A., Weegar, R., Casillas, A., Gojenola, K., Oronoz, M., & Dalianis, H. (2017). Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora. Journal of Biomedical Informatics, 71, 16–30.
Article Google Scholar
Pestian, J. P., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B., et al. (2007). A shared task involving multi-label classification of clinical free text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (pp. 97–104). Association for Computational Linguistics.
Google Scholar
Proux, D., Hagège, C., Gicquel, Q., Pereira, S., Darmoni, S., Segond, F., et al. (2011). Architecture and systems for monitoring hospital acquired infections inside a hospital information workflow. In Proceedings of the Workshop on Biomedical Natural Language Processing. USA: Portland, Oregon (p. 43e48). Citeseer.
Google Scholar
Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Roberts, I., et al. (2009). Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics, 42(5), 950–966.
Article Google Scholar
Roller, R., Uszkoreit, H., Xu, F., Seiffe, L., Mikhailov, M., Staeck, O., et al. (2016). A fine-grained corpus annotation schema of German nephrology records. In Proceedings of the Clinical Natural Language Processing Workshop, Osaka, Japan, December 11–17 (pp. 69–77).
Google Scholar
Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.
Article Google Scholar
Saeed, M., Villarroel, M., Reisner, A. T., Clifford, G., Lehman, L.-W., Moody, G., et al. (2011). Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39(5), 952.
Article Google Scholar
Saurí, R., & Pustejovsky, J. (2009). Factbank: A corpus annotated with event factuality. Language Resources and Evaluation, 43(3), 227–268.
Article Google Scholar
Siklósi, B., Novák, A., & Prószéky, G. (2014). Resolving abbreviations in clinical texts without pre-existing structured resources. In Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, LREC (Vol. 2014).
Google Scholar
Skeppstedt, M., Kvist, M., & Dalianis, H. (2012). Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012 (pp. 1250–1257).
Google Scholar
Spat, S., Cadonna, B., Rakovac, I., Gütl, C., Leitner, H., Stark, G., et al. (2008). Enhanced information retrieval from narrative German-language clinical text documents using automated document classification. Studies in Health Technology and Informatics, 136, 473.
Google Scholar
Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.
Google Scholar
Velupillai, S. (2012). Shades of Certainty: Annotation and Classification of Swedish Medical Records. PhD thesis, Stockholm University.
Google Scholar
Velupillai, S., Dalianis, H., & Kvist, M. (2011). Factuality levels of diagnoses in Swedish clinical text. In MIE-Medical Informatics Europe (pp. 559–563). http://dx.doi.org/10.3233/978-1-60750-806-9-559.
Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.
Article Google Scholar
Weegar, R., & Dalianis, H. (2015). Creating a rule based system for text mining of Norwegian breast cancer pathology reports. In Sixth International Workshop in Health Text Mining and Information Analysis (LOUHI), Held in Conjunction with EMNLP 2015, Lisbon, Portugal (pp. 73–78).
Google Scholar
Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.
Google Scholar
Zhang, S., Kang, T., Zhang, X., Wen, D., Elhadad, N., & Lei, J. (2016). Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models. Journal of Biomedical Informatics, 60, 334–341.
Article Google Scholar
Zubke, M. (2017). Classification based extraction of numeric values from clinical narratives. In Proceedings of RANLP Workshop on Biomedical Natural Language Processing (pp. 24–31).
Google Scholar

Download references

Author information

Authors and Affiliations

DSV-Stockholm University, Kista, Sweden
Hercules Dalianis

Authors

Hercules Dalianis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dalianis, H. (2018). Characteristics of Patient Records and Clinical Corpora. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-78503-5_4
Published: 15 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78502-8
Online ISBN: 978-3-319-78503-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics