A Quantitative Analysis of the Performance and Scalability of De-identification Tools for Medical Data

Liu, Zhiming; Qamar, Nafees; Qian, Jie

doi:10.1007/978-3-642-53956-5_18

A Quantitative Analysis of the Performance and Scalability of De-identification Tools for Medical Data

Zhiming Liu¹⁸,
Nafees Qamar¹⁸ &
Jie Qian¹⁸

Conference paper

1298 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8315))

Abstract

Recent developments in data de-identification technologies offer sophisticated solutions to protect medical data when, especially the data is to be provided for secondary purposes such as clinical or biomedical research. So as to determine to what degree an approach– along with its tool– is usable and effective, this paper takes into consideration a number of de-identification tools that aim at reducing the re-identification risk for the published medical data, yet preserving its statistical meanings. We therefore evaluate the residual risk of re-identification by conducting an experimental evaluation of the most stable research-based tools, as applied to our Electronic Health Records (EHRs) database, to assess which tool exhibits better performance with different quasi-identifiers. Our evaluation criteria are quantitative as opposed to other descriptive and qualitative assessments. We notice that on comparing individual disclosure risk and information loss of each published data, the μ-Argus tool performs better. Also, the generalization method is considerably better than the suppression method in terms of reducing risk and avoiding information loss. We also find that sdcMicro has the best scalability among its counterparts, as has been observed experimentally on a virtual data consisted of 33 variables and 10,000 records.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

di Vimercati, S.D., Foresti, S., Livraga, G., Samarati, P.: rotecting privacy in data release. In: 11th International School on Foundations of Security Analysis and Design, pp. 1–34 (2011)
Google Scholar
Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)
Article Google Scholar
Emam, K.E., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: Research paper: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. (JAMIA) 16(5), 670–682 (2009)
Article Google Scholar
Emam, K.E., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: Research paper: A globally optimal k-anonymity method for the de-identification of health data. JAMIA 16(5), 670–682 (2009)
Google Scholar
Fitzgerald, T.: Building management commitment through security councils. Information Systems Security 14(2), 27–36 (2005)
Article MathSciNet Google Scholar
Fraser, R., Willison, D.: Tools for de-identification of personal health information (September 2009), http://www.infoway-inforoute.ca/index.php/
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42(4) (2010)
Google Scholar
Garde, S., Hovenga, E.J.S., Buck, J., Knaup, P.: Ubiquitous information for ubiquitous computing: Expressing clinical data sets with openehr archetypes. In: MIE, pp. 215–220 (2006)
Google Scholar
Gupta, D., Saul, M., Gilbertson, J.: Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research. American Journal of Clinical Pathology, 176–186 (2004)
Google Scholar
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., Wolf, P.-P.D.: Handbook on statistical disclosure control (December 2006)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115 (2007)
Google Scholar
Li, T., Li, N.: Optimal k-anonymity with flexible generalization schemes through bottom-up searching. In: IEEE International Conference on Data Mining Workshops (ICDMW 2006), pp. 518–523 (2006)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: 23rd International Conference on Data Engineering (ICDE 2006), p. 24 (2006)
Google Scholar
Netherlands, S.: u-argus user’s manual, http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf
Qamar, N., Faber, J., Ledru, Y., Liu, Z.: Automated reviewing of healthcare security policies. In: 2nd International Symposium on Foundations of Health Information Engineering and Systems (FHIES), pp. 176–193 (2012)
Google Scholar
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: PODS, p. 188 (1998)
Google Scholar
Sweeney, L.: Simple demographics often identify people uniquely. Pittsburgh: Carnegie Mellon University, Data Privacy Working Paper 3, 50–59 (2000)
Google Scholar
Sweeney, L.: Computational disclosure control - a primer on data privacy protection. Technical report, Massachusetts Institute of Technology (2001)
Google Scholar
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Templ, M.: Statistical disclosure control for microdata using the r-package sdcmicro. Transactions on Data Privacy 1(2), 67–85 (2008)
MathSciNet Google Scholar
Templ, M., Meindl, B.: The anonymisation of the cvts2 and income tax dataset. an approach using r-package sdcmicro (2007)
Google Scholar
Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009), pp. 1051–1054 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute for Software Technology, United Nations University, Macau SAR China
Zhiming Liu, Nafees Qamar & Jie Qian

Authors

Zhiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Nafees Qamar
View author publications
You can also search for this author in PubMed Google Scholar
Jie Qian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Oxford University, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Jeremy Gibbons
Department of Mathematics, St. Francis Xavier University, P.O. Box 5000, B2G 2W5, Antigonish, NS, Canada
Wendy MacCaull

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Qamar, N., Qian, J. (2014). A Quantitative Analysis of the Performance and Scalability of De-identification Tools for Medical Data. In: Gibbons, J., MacCaull, W. (eds) Foundations of Health Information Engineering and Systems. FHIES 2013. Lecture Notes in Computer Science, vol 8315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53956-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-53956-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53955-8
Online ISBN: 978-3-642-53956-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics