Abstract
In this paper, we evaluate empirically the quality of statistical inference from differentially-private synthetic contingency tables. We compare three methods: histogram perturbation, the Dirichlet-Multinomial synthesizer and the Hardt-Ligett-McSherry algorithm. We consider a goodness-of-fit test for models suitable to the real data, and a model selection procedure. We find that the theoretical guarantees associated with these differentially-private datasets do not always translate well into guarantees about the statistical inference on the synthetic datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, Accuracy, and Consistency too: a Holistic Solution to Contingency Table Release. In: Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282 (2007)
Charest, A.-S.: How Can We Analyze Differentially-Private Synthetic Datasets? Journal of Privacy and Confidentiality 2(2), 21–33 (2011)
Charest, A.-S.: Creation and Analysis of Differentially-Private Synthetic Datasets. PhD Thesis, Carnegie Mellon University (2012)
Christiansen, S., Giese, H.: Genetic Analysis of the Obligate Parasitic Barley Powdery Mildew Fungus Based on RFLP and Virulence Loci. TAG Theoretical and Applied Genetics 79(5), 705–712 (1990)
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Edwards, D.: Introduction to Graphical Modelling. Springer (2000)
Edwards, D., Toma, H.: A Fast Procedure for Model Search in Multidimensional Contingency Tables. Biometrika 72(2), 339–351 (1985)
Fienberg, S., Rinaldo, A., Yang, X.: Differential Privacy and the Risk-Utility Tradeoff for Multi-Dimensional Contingency Tables. In: Privacy in Statistical Databases, pp. 187–199 (2011)
Fienberg, S.E., Rinaldo, A.: Maximum Likelihood Estimation in Log-Linear Models: Theory and Algorithms. In: Annals of Statistics (to appear, 2012)
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Arxiv preprint arXiv:1012.4763 (2010)
Hardt, M., Rothblum, G.: Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In: Proc. 51st Foundations of Computer Science (FOCS). IEEE (2010)
Kinney, S.: Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse. ProQuest (2007)
Kinney, S., Reiter, J., Berger, J.: Model Selection when Multiple Imputation Is Used to Protect Confidentiality in Public Use Data. Journal of Privacy and Confidentiality 2(2), 3–19 (2010)
Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 123–134 (2010)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: Theory Meets Practice on the Map. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 277–286 (2008)
McSherry, F., Talwar, K.: Mechanism Design via Differential Privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103 (2007)
Reiter, J.: Inference for Partially Synthetic, Public Use Microdata Sets. Survey Methodology 29(2), 181–188 (2003)
Reiter, J.: Significance Tests for Multi-Component Estimands from Multiply Imputed, Synthetic Microdata. Journal of Statistical Planning and Inference 131(2), 365–377 (2005)
Rubin, D.B.: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–468 (1993)
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Charest, AS. (2012). Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-33627-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33626-3
Online ISBN: 978-3-642-33627-0
eBook Packages: Computer ScienceComputer Science (R0)