Skip to main content

Advertisement

SpringerLink for Corporate & Health
Book cover

Clinical Text Mining pp 21–34Cite as

Characteristics of Patient Records and Clinical Corpora

Characteristics of Patient Records and Clinical Corpora

  • Hercules Dalianis2 
  • Chapter
  • Open Access
  • First Online: 15 May 2018
  • 17k Accesses

  • 4 Citations

Abstract

This chapter specifically details the linguistic characteristics of patient record text in the form of spelling errors, domain specific abbreviations, negation and assertion expressions, etc. for English, Swedish and other languages.

Download chapter PDF

References

  • Afzal, Z., Pons, E., Kang, N., Sturkenboom, M. C. J. M., Schuemie, M. J., & Kors, J. A. (2014). ContextD: An algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinformatics, 15(1), 373.

    Google Scholar 

  • Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., et al. (2011). Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics, 2(Suppl 3), 1–11.

    CrossRef  Google Scholar 

  • Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Masuichi, H., Waki, K., et al. (2010). Extraction of adverse drug effects from clinical records. Studies in Health Technology and Informatics, 160(Pt 1), 739–743.

    Google Scholar 

  • Asamura, H., Wittekind, C., & Sobin, L. H. (2014). TNM Atlas: Illustrated Guide to the TNM Classification of Malignant Tumours. New York: Wiley.

    Google Scholar 

  • Attardi, G., Cozza, V., & Sartiano, D. (2015). Annotation and extraction of relations from Italian medical records. In Proceedings of the 6th Italian Information Retrieval Workshop, Cagliari, Italy.

    Google Scholar 

  • Boytcheva, S., Angelova, G., Angelov, Z., & Tcharaktchiev, D. (2015). Text mining and big data analytics for retrospective analysis of clinical texts from outpatient care. Cybernetics and Information Technologies, 15(4), 58–77.

    Google Scholar 

  • Boytcheva, S., Nikolova, I., Angelova, G., & Angelov, Z. (2017b). Identification of risk factors in clinical texts through association rules. In Proceedings of RANLP Workshop on Biomedical Natural Language Processing (pp. 64–72).

    Google Scholar 

  • Cederblom, S. (2005). Medicinska förkortningar och akronymer. Studentlitteratur, Lund.

    Google Scholar 

  • Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301–310.

    CrossRef  Google Scholar 

  • Chazard, E., Ficheur, G., Bernonville, S., Luyckx, M., & Beuscart, R. (2011). Data mining to generate adverse drug events detection rules. IEEE Transactions on Information Technology in Biomedicine, 15(6), 823–830.

    CrossRef  Google Scholar 

  • Cotik, V., Filippo, D., Uszkoreit, H., & Xu, F. (2017). Annotation of entities and relations in Spanish radiology reports. In Proceedings of Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria (pp. 177–184).

    Google Scholar 

  • Dalianis, H. (2014). Clinical text retrieval - An overview of basic building blocks and applications. In Professional Search in the Modern World (pp. 147–165). Berlin: Springer.

    Google Scholar 

  • Dalianis, H., Hassel, M., & Velupillai, S. (2009). The Stockholm EPR Corpus-characteristics and some initial findings. In Proceedings of ISHIMR 2009, Evaluation and Implementation of e-Health and Health Information Initiatives: International Perspectives. 14th International Symposium for Health Information Management Research (pp. 243–249).

    Google Scholar 

  • Dalianis, H., Henriksson, A., Kvist, M., Velupillai, S., & Weegar, R. (2015). HEALTH BANK–A workbench for data science applications in healthcare. In J. Krogstie, G. Juel-Skielse, & V. Kabilan (Eds.), Proceedings of the CAiSE-2015 Industry Track Co-located with 27th Conference on Advanced Information Systems Engineering (CAiSE 2015), Stockholm, Sweden, June 11, 2015, CEUR (Vol. 1381, pp. 1–18). https://doi.org/urn:nbn:de:0074-1381-0E.

  • Dalianis, H., & Skeppstedt, M. (2010). Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (pp. 5–13). Association for Computational Linguistics.

    Google Scholar 

  • Ehrentraut, C., Tanushi, H., Tiedemann, J., & Dalianis, H. (2012). Detection of hospital acquired infections in sparse and noisy Swedish patient records. In Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) Held in Conjunction with Coling 2012, Bombay. ACM Digital Library.

    Google Scholar 

  • Eriksson, R., Jensen, P. B., Frankild, S., Jensen, L. J., & Brunak, S. (2013). Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text. Journal of the American Medical Informatics Association, 20(5), 947–953.

    CrossRef  Google Scholar 

  • Grigonyte, G., Kvist, M., Velupillai, S., & Wirén, M. Improving readability of Swedish electronic health records through lexical simplification: First results. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations – PITR, Gothenburg, Sweden, April 2014 (pp. 74–83). Association for Computational Linguistics. http://www.aclweb.org/anthology/W14-1209. Accessed 11 Jan 2018.

  • Groopman, J. E. (2007). How Doctors Think. New York: Houghton Mifflin Company.

    Google Scholar 

  • Grouin, C., & Névéol, A. (2014). De-identification of clinical notes in French: Towards a protocol for reference corpus development. Journal of Biomedical Informatics, 50, 151–161.

    CrossRef  Google Scholar 

  • Isenius, N. (2012). Abbreviation Detection in Swedish Medical Records. The Development of SCAN, A Swedish Clinical Abbreviation Normalizer. Master’s thesis, Department of Computer and Systems Sciences, Stockholm University.

    Google Scholar 

  • Isenius, N., Velupillai, S., & Kvist, M. (2012). Initial results in the development of SCAN. A Swedish clinical abbreviation normalizer. In CLEFeHealth 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Rome.

    Google Scholar 

  • Jensen, K., Soguero-Ruiz, C., Mikalsen, K. O., Lindsetmo, R.-O., Kouskoumvekaki, I., Girolami, M., et al. (2017). Analysis of free text in electronic health records for identification of cancer patient trajectories. Scientific Reports, 7, 46226.

    CrossRef  Google Scholar 

  • Koeling, R., Carroll, J., Tate, A. R., & Nicholson, A. (2011). Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In Proceedings of the 3rd Louhi Workshop on Text and Data Mining of Health Documents (pp. 43–50).

    Google Scholar 

  • Kvist, M., & Velupillai, S. (2014). SCAN: A Swedish clinical abbreviation normalizer. Further development and adaptation to radiology. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62–73). Berlin: Springer.

    Google Scholar 

  • Lewis, J. D., Schinnar, R., Bilker, W. B., Wang, X., & Strom, B. L. (2007). Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiology and Drug safety, 16(4), 393–401.

    CrossRef  Google Scholar 

  • Liu, H., Lussier, Y. A., & Friedman, C. (2001). A study of abbreviations in the UMLS. In AMIA Annual Symposium Proceedings (p. 393). American Medical Informatics Association.

    Google Scholar 

  • Lövestam, E., Velupillai, S., & Kvist, M. (2014). Abbreviations in Swedish clinical text - Use by three professions. Studies in Health Technology and Informatics, 205, 720–724. https://doi.org/10.3233/978-1-61499-432-9-720.

    Google Scholar 

  • Marciniak, M., & Mykowiecka, A. (2014). Terminology extraction from medical texts in Polish. Journal of Biomedical Semantics, 5(1), 24.

    CrossRef  Google Scholar 

  • Névéol, A., Dalianis, H., Savova, G., & Zweigenbaum, P. (2018). Clinical natural language processing in languages other than english: opportunities and challenges. Journal of Biomedical Semantics, 9(12), 1–13.

    Google Scholar 

  • Nguyen, A. N., Moore, J., O’Dwyer, J., & Philpot, S. (2016). Automated cancer registry notifications: validation of a medical text analytics system for identifying patients with cancer from a state-wide pathology repository. In AMIA Annual Symposium Proceedings (pp. 964–973). American Medical Informatics Association.

    Google Scholar 

  • Nizamuddin, N., & Dalianis, H. (2014). Detection of spelling errors in Swedish clinical text. In 1st Nordic Workshop on Evaluation of Spellchecking and Proofing Tools (NorWEST2014), SLTC 2014.

    Google Scholar 

  • Olsson, M. (2011). Vem begriper patientjournalen? (In Swedish). Bachelor’s thesis, Linnaeus University.

    Google Scholar 

  • Pakhomov, S., Pedersen, T., & Chute, C. G. (2005). Abbreviation and acronym disambiguation in clinical discourse. In AMIA Annual Symposium Proceedings (Vol. 2005, p. 589). American Medical Informatics Association.

    Google Scholar 

  • Pantazos, K., Lauesen, S., & Lippert, S. (2016). Preserving medical correctness, readability and consistency in de-identified health records. Health Informatics Journal, 23(4), 291–303.

    CrossRef  Google Scholar 

  • Patrick, J., & Nguyen, D. (2011). Automated proof reading of clinical notes. In PACLIC, 25th Pacific Asia Conference on Language, Information and Computation (pp. 303–312).

    Google Scholar 

  • Perera, G., Broadbent, M., Callard, F., Chang, C.-K., Downs, J., Dutta, R., et al. (2016). Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an electronic mental health record-derived data resource. BMJ Open, 6(3), e008721.

    CrossRef  Google Scholar 

  • Pérez, A., Weegar, R., Casillas, A., Gojenola, K., Oronoz, M., & Dalianis, H. (2017). Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora. Journal of Biomedical Informatics, 71, 16–30.

    CrossRef  Google Scholar 

  • Pestian, J. P., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B., et al. (2007). A shared task involving multi-label classification of clinical free text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (pp. 97–104). Association for Computational Linguistics.

    Google Scholar 

  • Proux, D., Hagège, C., Gicquel, Q., Pereira, S., Darmoni, S., Segond, F., et al. (2011). Architecture and systems for monitoring hospital acquired infections inside a hospital information workflow. In Proceedings of the Workshop on Biomedical Natural Language Processing. USA: Portland, Oregon (p. 43e48). Citeseer.

    Google Scholar 

  • Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Roberts, I., et al. (2009). Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics, 42(5), 950–966.

    CrossRef  Google Scholar 

  • Roller, R., Uszkoreit, H., Xu, F., Seiffe, L., Mikhailov, M., Staeck, O., et al. (2016). A fine-grained corpus annotation schema of German nephrology records. In Proceedings of the Clinical Natural Language Processing Workshop, Osaka, Japan, December 11–17 (pp. 69–77).

    Google Scholar 

  • Ruch, P., Robert, B., & Antoine, G. (2003). Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine, 29(1), 169–184.

    CrossRef  Google Scholar 

  • Saeed, M., Villarroel, M., Reisner, A. T., Clifford, G., Lehman, L.-W., Moody, G., et al. (2011). Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39(5), 952.

    CrossRef  Google Scholar 

  • Saurí, R., & Pustejovsky, J. (2009). Factbank: A corpus annotated with event factuality. Language Resources and Evaluation, 43(3), 227–268.

    CrossRef  Google Scholar 

  • Siklósi, B., Novák, A., & Prószéky, G. (2014). Resolving abbreviations in clinical texts without pre-existing structured resources. In Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, LREC (Vol. 2014).

    Google Scholar 

  • Skeppstedt, M., Kvist, M., & Dalianis, H. (2012). Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012 (pp. 1250–1257).

    Google Scholar 

  • Spat, S., Cadonna, B., Rakovac, I., Gütl, C., Leitner, H., Stark, G., et al. (2008). Enhanced information retrieval from narrative German-language clinical text documents using automated document classification. Studies in Health Technology and Informatics, 136, 473.

    Google Scholar 

  • Velupillai, S. (2011). Automatic classification of factuality levels: A case study on Swedish diagnoses and the impact of local context. In Fourth International Symposium on Languages in Biology and Medicine, LBM 2011.

    Google Scholar 

  • Velupillai, S. (2012). Shades of Certainty: Annotation and Classification of Swedish Medical Records. PhD thesis, Stockholm University.

    Google Scholar 

  • Velupillai, S., Dalianis, H., & Kvist, M. (2011). Factuality levels of diagnoses in Swedish clinical text. In MIE-Medical Informatics Europe (pp. 559–563). http://dx.doi.org/10.3233/978-1-60750-806-9-559.

  • Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope Corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11), S9.

    CrossRef  Google Scholar 

  • Weegar, R., & Dalianis, H. (2015). Creating a rule based system for text mining of Norwegian breast cancer pathology reports. In Sixth International Workshop in Health Text Mining and Information Analysis (LOUHI), Held in Conjunction with EMNLP 2015, Lisbon, Portugal (pp. 73–78).

    Google Scholar 

  • Wu, Y., Rosenbloom, S. T., Denny, J. C., Miller, R. A., Mani, S., Giuse, D. A., et al. (2011). Detecting abbreviations in discharge summaries using machine learning methods. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1541). American Medical Informatics Association.

    Google Scholar 

  • Zhang, S., Kang, T., Zhang, X., Wen, D., Elhadad, N., & Lei, J. (2016). Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models. Journal of Biomedical Informatics, 60, 334–341.

    CrossRef  Google Scholar 

  • Zubke, M. (2017). Classification based extraction of numeric values from clinical narratives. In Proceedings of RANLP Workshop on Biomedical Natural Language Processing (pp. 24–31).

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. DSV-Stockholm University, Kista, Sweden

    Hercules Dalianis

Authors
  1. Hercules Dalianis
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

This chapter is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

Copyright information

© 2018 The Author(s)

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Dalianis, H. (2018). Characteristics of Patient Records and Clinical Corpora. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_4

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-319-78503-5_4

  • Published: 15 May 2018

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78502-8

  • Online ISBN: 978-3-319-78503-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.