Data Confidentiality

Henle, Theresa; Matthews, Gregory J.; Harel, Ofer

doi:10.1007/978-1-4939-8715-3_28

Theresa Henle⁸,
Gregory J. Matthews⁸ &
Ofer Harel⁹

Part of the book series: Health Services Research ((HEALTHSR))

1796 Accesses

Abstract

When medical data are collected and disseminated for research purposes, the organization which releases the data has an ethical, and in most cases a legal, responsibility to maintain the confidentiality of the data relating to individuals involved. Striking a balance between getting data to researchers and maintaining this confidentiality is becoming an increasingly tricky proposition. Methods developed in the field of statistical disclosure control aim to thwart potential disclosures of private information while still allowing researchers the ability to use the data. This chapter presents a survey of the main types of potential disclosure risks, an overview of the widely used disclosure control methods, and the most common techniques for measuring privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 649.99; Price excludes VAT (USA)

Hardcover Book: USD 899.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

DN Capital – Venture Capital. Beyond ‘big data’ to data driven decisions. 2015. Dncaptical.com/thoughts/beyond-big-data-to-data-driven-decisions/.
Dwork C. Differential privacy. In: ICALP. Springer Verlag; 2006. p. 1–12. MR2307219.
Google Scholar
Fellegi IP. On the question of statistical confidentiality. J Am Stat Assoc. 1972;67(337):7–18.
Article Google Scholar
Fienberg SE, McIntyre J. Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer J, Torra V, editors. Privacy in statistical databases. Vol. 3050 of lecture notes in computer science. Berlin/Heidelberg: Springer; 2004. p. 519. https://doi.org/10.1007/978-3-540-25955-8_2.
Chapter Google Scholar
Gkoulalas-Divanis A, Loukides. A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 17–34.
Chapter Google Scholar
Greenberg B. Rank swapping for masking ordinal microdata. Technical report, U.S. Bureau of the Census (unpublished manuscript), Suitland; 1987.
Google Scholar
Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–39. MR0247719.
Article Google Scholar
Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339:321–4.
Article CAS PubMed Google Scholar
Harel O, Zhou X.-H. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 2007;26, 3057–3077. MR2380504
Article PubMed Google Scholar
Health Insurance Portability and Accountability Act (HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted August 21, 1996.
Google Scholar
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet 2008;4(8): e1000167. https://doi.org/10.1371/journal.pgen.1000167
Article PubMed PubMed Central Google Scholar
Lauger A, et al. Disclosure avoidance techniques at the U.S. census bureau: current practices and research. Research report series. 2014. www.census.gov/srd/CDAR/cdar2014-02_Discl_Avoid_Techniques.pdf
Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on; 2007. p. 106–15.
Google Scholar
Li H, et al. Differentially private histogram and synthetic data publication. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 35–58.
Chapter Google Scholar
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3.
Article Google Scholar
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L. Privacy: theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Computer Science Department, Cornell; 2008. p. 10.
Google Scholar
Matthews GJ, Harel O. Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Statist Surv. 2011:1–29. https://doi.org/10.1214/11-SS074.
Article Google Scholar
Matthews GJ, Harel O. Assessing the privacy of randomized vector valued queries to a database using the area under the receiver-operating characteristic curve. Health Serv Outcome Res Methodol. 2012;12(2–3):141–55.
Article Google Scholar
Matthews GJ, Harel O, Aseltine RH. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcome Res Methodol. 2010;10(1):1–15.
Article Google Scholar
Moore Jr R. Controlled data-swapping techniques for masking public use microdata. Census Tech Report. 1996.
Google Scholar
Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In: STOC ‘07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing; 2007. p. 75–84. MR2402430.
Google Scholar
OECD Statistics. Glossary of statistical terms. OECD glossary of statistical terms – data swapping definition, stats. 2008. Oecd.org/glossary/detail.asp?ID=6904
Paass G. Disclosure risk and disclosure avoidance for microdata. J Bus Econ Stat. 1988;6(4):487–500.
Google Scholar
Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1–16.
Google Scholar
Reiter JP. Inference for partially synthetic, public use microdata sets. Survey Methodology 2003;29 (2), 181–188.
Google Scholar
Reiter JP. Releasing multiply imputed, synthetic public use micro- data: an illustration and empirical study. J Royal Stat Soc Series A Stat Soc. 2005;168(1):185–205. MR2113234.
Article Google Scholar
Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 1987. MR0899519.
Book Google Scholar
Rubin DB. Comment on “statistical disclosure limitation”. J Off Stat. 1993;9:461–8.
Google Scholar
Sarathy R, Muralidhar K. The security of confidential numerical data in databases. Inf Syst Res. 2002;13(4):389–403.
Article Google Scholar
Shlomo N. Statistical disclosure limitation for health data: a statistical agency perspective. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 201–30.
Chapter Google Scholar
Singh A, Yu F, Dunteman G. MASSC: a new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality; 2003. p. 373–94.
Google Scholar
Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure control for census microdata. Journal of Official Statistics 1994;10, 31–51.
Google Scholar
Spruill NL. Measures of confidentiality. Proceedings of the section on survey research methods, American Statistical Association. 1982
Google Scholar
Sweeney L. Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.
Google Scholar
Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002a;10(5):571–88. MR1948200.
Article Google Scholar
Sweeney, L. Simple demographics often identify people uniquely. Carnegie Mellon University, data privacy working paper 3. 2002b.
Google Scholar
Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002c;10(5):557–70. MR1948199.
Article Google Scholar
Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.
Article CAS PubMed Google Scholar
Willenborg L, de Waal T. Elements of statistical disclosure control. New York: Springer; 2001. MR1866909.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Loyola University, Chicago, IL, USA
Theresa Henle & Gregory J. Matthews
Department of Statistics, University of Connecticut, Storrs, CT, USA
Ofer Harel

Authors

Theresa Henle
View author publications
You can also search for this author in PubMed Google Scholar
Gregory J. Matthews
View author publications
You can also search for this author in PubMed Google Scholar
Ofer Harel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ofer Harel .

Editor information

Editors and Affiliations

Community Health and Epidemiology, Dalhousie University, Halifax, NS, Canada
Adrian Levy
ICON plc, Vancouver, BC, Canada
Sarah Goring
Department of Biostatistics, Brown University, Providence, RI, USA
Constantine Gatsonis
University of British Columbia, Vancouver, BC, Canada
Boris Sobolev
European Observatory on Health Systems and Policies, Department of Health Care Management, Berlin University of Technology, Berlin, Germany
Ewout van Ginneken
Department Health Care Management Faculty of Economics and Management, Technische Universität Berlin, Berlin, Germany
Reinhard Busse

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Henle, T., Matthews, G.J., Harel, O. (2019). Data Confidentiality. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_28

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8715-3_28
Published: 12 February 2019
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8714-6
Online ISBN: 978-1-4939-8715-3
eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics