Ethics and Privacy of Patient Records for Clinical Text Mining Research

  • Hercules Dalianis
Open Access


This chapter discusses ethical issues while working with sensitive material such as patient records, how to apply for ethical permission, the safe storage of sensitive data and other privacy-related topics.


  1. Alfalahi, A., Brissman, S., & Dalianis, H. (2012). Pseudonymisation of personal names and other PHIs in an annotated clinical Swedish corpus. In Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) Held in Conjunction with LREC 2012, May 26, Istanbul (pp. 49–54).Google Scholar
  2. Almgren, S., & Pavlov, S. (2016). Semi-supervised Named Entity Recognition of Medical Entities in Swedish. Master’s thesis, Department of Computer Science and Engineering, Chalmers University of Technology.Google Scholar
  3. Almgren, S., Pavlov, S., & Mogren, O. (2016). Named entity recognition in Swedish health records with character-based deep bidirectional LSTMs. In Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016), Held in Conjunction with Coling 2016 (pp. 30–29).Google Scholar
  4. Andersen, A., Yigzaw, K. Y., & Karlsen, R. (2014). Privacy preserving health data processing. In 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom) (pp. 225–230). New York: IEEE.CrossRefGoogle Scholar
  5. Antfolk, A., & Branting, R. (2016). Pseudonymisering av platser i patient-journaltexter (In Swedish). Bachelor’s thesis, Department of Computer and Systems Sciences, Stockholm University.Google Scholar
  6. Björkegren, A. (2011). Pseudonymisering av digitala patientjournaler (In Swedish). Bachelor’s thesis, Department of Computer and Systems Sciences, Stockholm University.Google Scholar
  7. Carrell, D., Malin, B., Aberdeen, J., Bayer, S., Clark, C., Wellner, B., et al. (2013). Hiding in plain sight: Use of realistic surrogates to reduce exposure of protected health information in clinical text. Journal of the American Medical Informatics Association, 20(2), 342–348.CrossRefGoogle Scholar
  8. Dalianis, H., & Boström, H. (2012). Releasing a Swedish clinical corpus after removing all words–de-identification experiments with conditional random fields and random forests. In Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) Held in Conjunction with LREC (pp. 45–48).Google Scholar
  9. Dalianis, H., Henriksson, A., Kvist, M., Velupillai, S., & Weegar, R. (2015). HEALTH BANK–A workbench for data science applications in healthcare. In J. Krogstie, G. Juel-Skielse, & V. Kabilan (Eds.), Proceedings of the CAiSE-2015 Industry Track Co-located with 27th Conference on Advanced Information Systems Engineering (CAiSE 2015), Stockholm, Sweden, June 11, 2015, CEUR (Vol. 1381, pp. 1–18).
  10. Dalianis, H., & Velupillai, S. (2010b). De-identifying Swedish clinical text-refinement of a gold standard and experiments with conditional random fields. Journal of Biomedical Semantics, 1, 6.CrossRefGoogle Scholar
  11. Deleger, L., Lingren, T., Ni, Y., Kaiser, M., Stoutenborough, L., Marsolo, K., et al. (2014). Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research. Journal of Biomedical Informatics, 50, 173–183.CrossRefGoogle Scholar
  12. Dorr, D. A., Phillips, W. F., Phansalkar, S., Sims, S. A., & Hurdle, J. F. (2006). Assessing the difficulty and time cost of de-identification in clinical narratives. Methods of Information in Medicine, 45(3), 246–252.CrossRefGoogle Scholar
  13. Douglass, M., Clifford, G. D., Reisner, A., Moody, G. B., & Mark, R. G. (2004). Computer-assisted de-identification of free text in the MIMIC II database. In Computers in Cardiology, 2004 (pp. 341–344). New York: IEEE.CrossRefGoogle Scholar
  14. El Emam, K., Rodgers, S., & Malin, B. (2015). Anonymising and sharing individual patient data. BMJ, 350, h1139.CrossRefGoogle Scholar
  15. Gkoulalas-Divanis, A., Loukides, G., & Sun, J. (2014). Publishing data from electronic health records while preserving privacy: A survey of algorithms. Journal of Biomedical Informatics, 50, 4–19.CrossRefGoogle Scholar
  16. Grouin, C., & Névéol, A. (2014). De-identification of clinical notes in French: Towards a protocol for reference corpus development. Journal of Biomedical Informatics, 50, 151–161.CrossRefGoogle Scholar
  17. Hanauer, D., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Zheng, K., & Hirschman, L. (2013). Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs. International Journal of Medical Informatics, 82(9), 821–831.CrossRefGoogle Scholar
  18. Health Insurance Portability and Accountability Act (HIPAA). (2003). U.S. Department of Health and Human Services. Accessed 11 Jan 2018.
  19. Henriksson, A., Kvist, M., & Dalianis, H. (2017a). Prevalence estimation of protected health information in Swedish clinical text. Studies in Health Technology and Informatics, Vol 235, pp. 216–220.Google Scholar
  20. Henriksson, A., Kvist, M., & Dalianis, H. (2017b). Detecting protected health information in heterogeneous clinical notes. Studies in Health Technology and Informatics, Vol 245, pp. 394–397.Google Scholar
  21. Kokkinakis, D., & Thurin, A. (2007). Anonymisation of Swedish clinical data. In Conference on Artificial Intelligence in Medicine in Europe (pp. 237–241). Berlin: Springer.CrossRefGoogle Scholar
  22. Meystre, S., Friedlin, J., South, B., Shen, S., & Samore, M. (2010). Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology, 10(1), 70.CrossRefGoogle Scholar
  23. Meystre, S. M. (2015). De-identification of unstructured clinical data for patient privacy protection. In Medical Data Privacy Handbook (pp. 697–716). Berlin: Springer.CrossRefGoogle Scholar
  24. Meystre, S. M., Lovis, C., Bürkle, T., Tognola, G., Budrionis, A., & Lehmann, C. U. (2017). Clinical data reuse or secondary use: Current status and potential future progress. Yearbook of Medical Informatics, 26(01), 38–52.CrossRefGoogle Scholar
  25. Meystre, S. M., Shen, S., Hofmann, D., & Gundlapalli, A. V. (2014). Can physicians recognize their own patients in de-identified notes? In MIE-Medical Informatics Europe (pp. 778–782).Google Scholar
  26. Neamatullah, I., Douglass, M. M., Li-wei, H. L., Reisner, A., Villarroel, M., Long, W. J., et al. (2008). Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(1), 1.CrossRefGoogle Scholar
  27. Pantazos, K., Lauesen, S., & Lippert, S. (2016). Preserving medical correctness, readability and consistency in de-identified health records. Health Informatics Journal, 23(4), 291–303.CrossRefGoogle Scholar
  28. Suominen, H. (2012). Towards an international electronic repository and virtual laboratory of open data and open-source software for telehealth research: Comparison of international, Australian and Finnish privacy policies. Studies in Health Technology and Informatics, 182, 153–160.Google Scholar
  29. Suominen, H., Müller, H., Ohno-Machado, L., Salanterä, S., Schreier, G., & Hanlen, L. (2017). Prerequisites for International Exchanges of Health Information: Comparison of Australian, Austrian, Finnish, Swiss, and US Privacy Policies. Studies in Health Technology and Informatics, Vol 245, pp. 1312.Google Scholar
  30. Sweeney, L. (1996). Replacing personally-identifying information in medical records, the scrub system. In Proceedings of the AMIA Annual Fall Symposium (p. 333). American Medical Informatics Association.Google Scholar
  31. Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.MathSciNetCrossRefGoogle Scholar
  32. Uzuner, Ö., Luo, Y., & Szolovits, P. (2007). Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 14(5), 550–563.CrossRefGoogle Scholar
  33. Uzuner, Ö., Sibanda, T. C., Luo, Y., & Szolovits, P. (2008). A de-identifier for medical discharge summaries. Artificial Intelligence in Medicine, 42(1), 13–35.CrossRefGoogle Scholar
  34. Velupillai, S., Dalianis, H., Hassel, M., & Nilsson, G. H. (2009). Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics, 78(12), e19–e26.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Hercules Dalianis
    • 1
  1. 1.DSV-Stockholm UniversityKistaSweden

Personalised recommendations