Abstract
When medical data are collected and disseminated for research purposes, the organization which releases the data has an ethical, and in most cases a legal, responsibility to maintain the confidentiality of the data relating to individuals involved. Striking a balance between getting data to researchers and maintaining this confidentiality is becoming an increasingly tricky proposition. Methods developed in the field of statistical disclosure control aim to thwart potential disclosures of private information while still allowing researchers the ability to use the data. This chapter presents a survey of the main types of potential disclosure risks, an overview of the widely used disclosure control methods, and the most common techniques for measuring privacy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DN Capital – Venture Capital. Beyond ‘big data’ to data driven decisions. 2015. Dncaptical.com/thoughts/beyond-big-data-to-data-driven-decisions/.
Dwork C. Differential privacy. In: ICALP. Springer Verlag; 2006. p. 1–12. MR2307219.
Fellegi IP. On the question of statistical confidentiality. J Am Stat Assoc. 1972;67(337):7–18.
Fienberg SE, McIntyre J. Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer J, Torra V, editors. Privacy in statistical databases. Vol. 3050 of lecture notes in computer science. Berlin/Heidelberg: Springer; 2004. p. 519. https://doi.org/10.1007/978-3-540-25955-8_2.
Gkoulalas-Divanis A, Loukides. A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 17–34.
Greenberg B. Rank swapping for masking ordinal microdata. Technical report, U.S. Bureau of the Census (unpublished manuscript), Suitland; 1987.
Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–39. MR0247719.
Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339:321–4.
Harel O, Zhou X.-H. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 2007;26, 3057–3077. MR2380504
Health Insurance Portability and Accountability Act (HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted August 21, 1996.
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet 2008;4(8): e1000167. https://doi.org/10.1371/journal.pgen.1000167
Lauger A, et al. Disclosure avoidance techniques at the U.S. census bureau: current practices and research. Research report series. 2014. www.census.gov/srd/CDAR/cdar2014-02_Discl_Avoid_Techniques.pdf
Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on; 2007. p. 106–15.
Li H, et al. Differentially private histogram and synthetic data publication. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 35–58.
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3.
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L. Privacy: theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Computer Science Department, Cornell; 2008. p. 10.
Matthews GJ, Harel O. Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Statist Surv. 2011:1–29. https://doi.org/10.1214/11-SS074.
Matthews GJ, Harel O. Assessing the privacy of randomized vector valued queries to a database using the area under the receiver-operating characteristic curve. Health Serv Outcome Res Methodol. 2012;12(2–3):141–55.
Matthews GJ, Harel O, Aseltine RH. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcome Res Methodol. 2010;10(1):1–15.
Moore Jr R. Controlled data-swapping techniques for masking public use microdata. Census Tech Report. 1996.
Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In: STOC ‘07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing; 2007. p. 75–84. MR2402430.
OECD Statistics. Glossary of statistical terms. OECD glossary of statistical terms – data swapping definition, stats. 2008. Oecd.org/glossary/detail.asp?ID=6904
Paass G. Disclosure risk and disclosure avoidance for microdata. J Bus Econ Stat. 1988;6(4):487–500.
Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1–16.
Reiter JP. Inference for partially synthetic, public use microdata sets. Survey Methodology 2003;29 (2), 181–188.
Reiter JP. Releasing multiply imputed, synthetic public use micro- data: an illustration and empirical study. J Royal Stat Soc Series A Stat Soc. 2005;168(1):185–205. MR2113234.
Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 1987. MR0899519.
Rubin DB. Comment on “statistical disclosure limitation”. J Off Stat. 1993;9:461–8.
Sarathy R, Muralidhar K. The security of confidential numerical data in databases. Inf Syst Res. 2002;13(4):389–403.
Shlomo N. Statistical disclosure limitation for health data: a statistical agency perspective. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 201–30.
Singh A, Yu F, Dunteman G. MASSC: a new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality; 2003. p. 373–94.
Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure control for census microdata. Journal of Official Statistics 1994;10, 31–51.
Spruill NL. Measures of confidentiality. Proceedings of the section on survey research methods, American Statistical Association. 1982
Sweeney L. Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.
Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002a;10(5):571–88. MR1948200.
Sweeney, L. Simple demographics often identify people uniquely. Carnegie Mellon University, data privacy working paper 3. 2002b.
Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002c;10(5):557–70. MR1948199.
Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.
Willenborg L, de Waal T. Elements of statistical disclosure control. New York: Springer; 2001. MR1866909.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Henle, T., Matthews, G.J., Harel, O. (2019). Data Confidentiality. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_28
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8715-3_28
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8714-6
Online ISBN: 978-1-4939-8715-3
eBook Packages: MedicineReference Module Medicine