Skip to main content

Data Confidentiality

  • Reference work entry
  • First Online:
Health Services Evaluation

Part of the book series: Health Services Research ((HEALTHSR))

  • 1796 Accesses

Abstract

When medical data are collected and disseminated for research purposes, the organization which releases the data has an ethical, and in most cases a legal, responsibility to maintain the confidentiality of the data relating to individuals involved. Striking a balance between getting data to researchers and maintaining this confidentiality is becoming an increasingly tricky proposition. Methods developed in the field of statistical disclosure control aim to thwart potential disclosures of private information while still allowing researchers the ability to use the data. This chapter presents a survey of the main types of potential disclosure risks, an overview of the widely used disclosure control methods, and the most common techniques for measuring privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 649.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 899.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • DN Capital – Venture Capital. Beyond ‘big data’ to data driven decisions. 2015. Dncaptical.com/thoughts/beyond-big-data-to-data-driven-decisions/.

  • Dwork C. Differential privacy. In: ICALP. Springer Verlag; 2006. p. 1–12. MR2307219.

    Google Scholar 

  • Fellegi IP. On the question of statistical confidentiality. J Am Stat Assoc. 1972;67(337):7–18.

    Article  Google Scholar 

  • Fienberg SE, McIntyre J. Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer J, Torra V, editors. Privacy in statistical databases. Vol. 3050 of lecture notes in computer science. Berlin/Heidelberg: Springer; 2004. p. 519. https://doi.org/10.1007/978-3-540-25955-8_2.

    Chapter  Google Scholar 

  • Gkoulalas-Divanis A, Loukides. A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 17–34.

    Chapter  Google Scholar 

  • Greenberg B. Rank swapping for masking ordinal microdata. Technical report, U.S. Bureau of the Census (unpublished manuscript), Suitland; 1987.

    Google Scholar 

  • Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–39. MR0247719.

    Article  Google Scholar 

  • Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339:321–4.

    Article  CAS  PubMed  Google Scholar 

  • Harel O, Zhou X.-H. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 2007;26, 3057–3077. MR2380504

    Article  PubMed  Google Scholar 

  • Health Insurance Portability and Accountability Act (HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted August 21, 1996.

    Google Scholar 

  • Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet 2008;4(8): e1000167. https://doi.org/10.1371/journal.pgen.1000167

    Article  PubMed  PubMed Central  Google Scholar 

  • Lauger A, et al. Disclosure avoidance techniques at the U.S. census bureau: current practices and research. Research report series. 2014. www.census.gov/srd/CDAR/cdar2014-02_Discl_Avoid_Techniques.pdf

  • Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on; 2007. p. 106–15.

    Google Scholar 

  • Li H, et al. Differentially private histogram and synthetic data publication. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 35–58.

    Chapter  Google Scholar 

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3.

    Article  Google Scholar 

  • Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L. Privacy: theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Computer Science Department, Cornell; 2008. p. 10.

    Google Scholar 

  • Matthews GJ, Harel O. Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Statist Surv. 2011:1–29. https://doi.org/10.1214/11-SS074.

    Article  Google Scholar 

  • Matthews GJ, Harel O. Assessing the privacy of randomized vector valued queries to a database using the area under the receiver-operating characteristic curve. Health Serv Outcome Res Methodol. 2012;12(2–3):141–55.

    Article  Google Scholar 

  • Matthews GJ, Harel O, Aseltine RH. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcome Res Methodol. 2010;10(1):1–15.

    Article  Google Scholar 

  • Moore Jr R. Controlled data-swapping techniques for masking public use microdata. Census Tech Report. 1996.

    Google Scholar 

  • Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In: STOC ‘07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing; 2007. p. 75–84. MR2402430.

    Google Scholar 

  • OECD Statistics. Glossary of statistical terms. OECD glossary of statistical terms – data swapping definition, stats. 2008. Oecd.org/glossary/detail.asp?ID=6904

  • Paass G. Disclosure risk and disclosure avoidance for microdata. J Bus Econ Stat. 1988;6(4):487–500.

    Google Scholar 

  • Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1–16.

    Google Scholar 

  • Reiter JP. Inference for partially synthetic, public use microdata sets. Survey Methodology 2003;29 (2), 181–188.

    Google Scholar 

  • Reiter JP. Releasing multiply imputed, synthetic public use micro- data: an illustration and empirical study. J Royal Stat Soc Series A Stat Soc. 2005;168(1):185–205. MR2113234.

    Article  Google Scholar 

  • Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 1987. MR0899519.

    Book  Google Scholar 

  • Rubin DB. Comment on “statistical disclosure limitation”. J Off Stat. 1993;9:461–8.

    Google Scholar 

  • Sarathy R, Muralidhar K. The security of confidential numerical data in databases. Inf Syst Res. 2002;13(4):389–403.

    Article  Google Scholar 

  • Shlomo N. Statistical disclosure limitation for health data: a statistical agency perspective. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 201–30.

    Chapter  Google Scholar 

  • Singh A, Yu F, Dunteman G. MASSC: a new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality; 2003. p. 373–94.

    Google Scholar 

  • Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure control for census microdata. Journal of Official Statistics 1994;10, 31–51.

    Google Scholar 

  • Spruill NL. Measures of confidentiality. Proceedings of the section on survey research methods, American Statistical Association. 1982

    Google Scholar 

  • Sweeney L. Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.

    Google Scholar 

  • Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002a;10(5):571–88. MR1948200.

    Article  Google Scholar 

  • Sweeney, L. Simple demographics often identify people uniquely. Carnegie Mellon University, data privacy working paper 3. 2002b.

    Google Scholar 

  • Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002c;10(5):557–70. MR1948199.

    Article  Google Scholar 

  • Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.

    Article  CAS  PubMed  Google Scholar 

  • Willenborg L, de Waal T. Elements of statistical disclosure control. New York: Springer; 2001. MR1866909.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ofer Harel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Henle, T., Matthews, G.J., Harel, O. (2019). Data Confidentiality. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8715-3_28

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-8714-6

  • Online ISBN: 978-1-4939-8715-3

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics