Data Confidentiality

  • Theresa Henle
  • Gregory J. Matthews
  • Ofer HarelEmail author
Reference work entry
Part of the Health Services Research book series (HEALTHSR)


When medical data are collected and disseminated for research purposes, the organization which releases the data has an ethical, and in most cases a legal, responsibility to maintain the confidentiality of the data relating to individuals involved. Striking a balance between getting data to researchers and maintaining this confidentiality is becoming an increasingly tricky proposition. Methods developed in the field of statistical disclosure control aim to thwart potential disclosures of private information while still allowing researchers the ability to use the data. This chapter presents a survey of the main types of potential disclosure risks, an overview of the widely used disclosure control methods, and the most common techniques for measuring privacy.


  1. DN Capital – Venture Capital. Beyond ‘big data’ to data driven decisions. 2015.
  2. Dwork C. Differential privacy. In: ICALP. Springer Verlag; 2006. p. 1–12. MR2307219.Google Scholar
  3. Fellegi IP. On the question of statistical confidentiality. J Am Stat Assoc. 1972;67(337):7–18.CrossRefGoogle Scholar
  4. Fienberg SE, McIntyre J. Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer J, Torra V, editors. Privacy in statistical databases. Vol. 3050 of lecture notes in computer science. Berlin/Heidelberg: Springer; 2004. p. 519. Scholar
  5. Gkoulalas-Divanis A, Loukides. A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 17–34.CrossRefGoogle Scholar
  6. Greenberg B. Rank swapping for masking ordinal microdata. Technical report, U.S. Bureau of the Census (unpublished manuscript), Suitland; 1987.Google Scholar
  7. Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–39. MR0247719.CrossRefGoogle Scholar
  8. Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339:321–4.PubMedCrossRefGoogle Scholar
  9. Harel O, Zhou X.-H. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 2007;26, 3057–3077. MR2380504PubMedCrossRefGoogle Scholar
  10. Health Insurance Portability and Accountability Act (HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted August 21, 1996.Google Scholar
  11. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet 2008;4(8): e1000167. Scholar
  12. Lauger A, et al. Disclosure avoidance techniques at the U.S. census bureau: current practices and research. Research report series. 2014.
  13. Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on; 2007. p. 106–15.Google Scholar
  14. Li H, et al. Differentially private histogram and synthetic data publication. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 35–58.CrossRefGoogle Scholar
  15. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3.CrossRefGoogle Scholar
  16. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L. Privacy: theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Computer Science Department, Cornell; 2008. p. 10.Google Scholar
  17. Matthews GJ, Harel O. Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Statist Surv. 2011:1–29. Scholar
  18. Matthews GJ, Harel O. Assessing the privacy of randomized vector valued queries to a database using the area under the receiver-operating characteristic curve. Health Serv Outcome Res Methodol. 2012;12(2–3):141–55.CrossRefGoogle Scholar
  19. Matthews GJ, Harel O, Aseltine RH. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcome Res Methodol. 2010;10(1):1–15.CrossRefGoogle Scholar
  20. Moore Jr R. Controlled data-swapping techniques for masking public use microdata. Census Tech Report. 1996.Google Scholar
  21. Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In: STOC ‘07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing; 2007. p. 75–84. MR2402430.Google Scholar
  22. OECD Statistics. Glossary of statistical terms. OECD glossary of statistical terms – data swapping definition, stats. 2008.
  23. Paass G. Disclosure risk and disclosure avoidance for microdata. J Bus Econ Stat. 1988;6(4):487–500.Google Scholar
  24. Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1–16.Google Scholar
  25. Reiter JP. Inference for partially synthetic, public use microdata sets. Survey Methodology 2003;29 (2), 181–188.Google Scholar
  26. Reiter JP. Releasing multiply imputed, synthetic public use micro- data: an illustration and empirical study. J Royal Stat Soc Series A Stat Soc. 2005;168(1):185–205. MR2113234.CrossRefGoogle Scholar
  27. Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 1987. MR0899519.CrossRefGoogle Scholar
  28. Rubin DB. Comment on “statistical disclosure limitation”. J Off Stat. 1993;9:461–8.Google Scholar
  29. Sarathy R, Muralidhar K. The security of confidential numerical data in databases. Inf Syst Res. 2002;13(4):389–403.CrossRefGoogle Scholar
  30. Shlomo N. Statistical disclosure limitation for health data: a statistical agency perspective. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 201–30.CrossRefGoogle Scholar
  31. Singh A, Yu F, Dunteman G. MASSC: a new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality; 2003. p. 373–94.Google Scholar
  32. Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure control for census microdata. Journal of Official Statistics 1994;10, 31–51.Google Scholar
  33. Spruill NL. Measures of confidentiality. Proceedings of the section on survey research methods, American Statistical Association. 1982Google Scholar
  34. Sweeney L. Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.Google Scholar
  35. Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002a;10(5):571–88. MR1948200.CrossRefGoogle Scholar
  36. Sweeney, L. Simple demographics often identify people uniquely. Carnegie Mellon University, data privacy working paper 3. 2002b.Google Scholar
  37. Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowledge Based Syst. 2002c;10(5):557–70. MR1948199.CrossRefGoogle Scholar
  38. Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.PubMedCrossRefGoogle Scholar
  39. Willenborg L, de Waal T. Elements of statistical disclosure control. New York: Springer; 2001. MR1866909.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Theresa Henle
    • 1
  • Gregory J. Matthews
    • 1
  • Ofer Harel
    • 2
    Email author
  1. 1.Department of Mathematics and StatisticsLoyola UniversityChicagoUSA
  2. 2.Department of StatisticsUniversity of ConnecticutStorrsUSA

Personalised recommendations