Abstract
Recent developments in data de-identification technologies offer sophisticated solutions to protect medical data when, especially the data is to be provided for secondary purposes such as clinical or biomedical research. So as to determine to what degree an approach– along with its tool– is usable and effective, this paper takes into consideration a number of de-identification tools that aim at reducing the re-identification risk for the published medical data, yet preserving its statistical meanings. We therefore evaluate the residual risk of re-identification by conducting an experimental evaluation of the most stable research-based tools, as applied to our Electronic Health Records (EHRs) database, to assess which tool exhibits better performance with different quasi-identifiers. Our evaluation criteria are quantitative as opposed to other descriptive and qualitative assessments. We notice that on comparing individual disclosure risk and information loss of each published data, the μ-Argus tool performs better. Also, the generalization method is considerably better than the suppression method in terms of reducing risk and avoiding information loss. We also find that sdcMicro has the best scalability among its counterparts, as has been observed experimentally on a virtual data consisted of 33 variables and 10,000 records.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
di Vimercati, S.D., Foresti, S., Livraga, G., Samarati, P.: rotecting privacy in data release. In: 11th International School on Foundations of Security Analysis and Design, pp. 1–34 (2011)
Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)
Emam, K.E., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: Research paper: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. (JAMIA) 16(5), 670–682 (2009)
Emam, K.E., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: Research paper: A globally optimal k-anonymity method for the de-identification of health data. JAMIA 16(5), 670–682 (2009)
Fitzgerald, T.: Building management commitment through security councils. Information Systems Security 14(2), 27–36 (2005)
Fraser, R., Willison, D.: Tools for de-identification of personal health information (September 2009), http://www.infoway-inforoute.ca/index.php/
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42(4) (2010)
Garde, S., Hovenga, E.J.S., Buck, J., Knaup, P.: Ubiquitous information for ubiquitous computing: Expressing clinical data sets with openehr archetypes. In: MIE, pp. 215–220 (2006)
Gupta, D., Saul, M., Gilbertson, J.: Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research. American Journal of Clinical Pathology, 176–186 (2004)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., Wolf, P.-P.D.: Handbook on statistical disclosure control (December 2006)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115 (2007)
Li, T., Li, N.: Optimal k-anonymity with flexible generalization schemes through bottom-up searching. In: IEEE International Conference on Data Mining Workshops (ICDMW 2006), pp. 518–523 (2006)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: 23rd International Conference on Data Engineering (ICDE 2006), p. 24 (2006)
Netherlands, S.: u-argus user’s manual, http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf
Qamar, N., Faber, J., Ledru, Y., Liu, Z.: Automated reviewing of healthcare security policies. In: 2nd International Symposium on Foundations of Health Information Engineering and Systems (FHIES), pp. 176–193 (2012)
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: PODS, p. 188 (1998)
Sweeney, L.: Simple demographics often identify people uniquely. Pittsburgh: Carnegie Mellon University, Data Privacy Working Paper 3, 50–59 (2000)
Sweeney, L.: Computational disclosure control - a primer on data privacy protection. Technical report, Massachusetts Institute of Technology (2001)
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Templ, M.: Statistical disclosure control for microdata using the r-package sdcmicro. Transactions on Data Privacy 1(2), 67–85 (2008)
Templ, M., Meindl, B.: The anonymisation of the cvts2 and income tax dataset. an approach using r-package sdcmicro (2007)
Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009), pp. 1051–1054 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Z., Qamar, N., Qian, J. (2014). A Quantitative Analysis of the Performance and Scalability of De-identification Tools for Medical Data. In: Gibbons, J., MacCaull, W. (eds) Foundations of Health Information Engineering and Systems. FHIES 2013. Lecture Notes in Computer Science, vol 8315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53956-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-53956-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53955-8
Online ISBN: 978-3-642-53956-5
eBook Packages: Computer ScienceComputer Science (R0)