Skip to main content

Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables

  • Conference paper
Privacy in Statistical Databases (PSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7556))

Included in the following conference series:

Abstract

In this paper, we evaluate empirically the quality of statistical inference from differentially-private synthetic contingency tables. We compare three methods: histogram perturbation, the Dirichlet-Multinomial synthesizer and the Hardt-Ligett-McSherry algorithm. We consider a goodness-of-fit test for models suitable to the real data, and a model selection procedure. We find that the theoretical guarantees associated with these differentially-private datasets do not always translate well into guarantees about the statistical inference on the synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, Accuracy, and Consistency too: a Holistic Solution to Contingency Table Release. In: Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282 (2007)

    Google Scholar 

  2. Charest, A.-S.: How Can We Analyze Differentially-Private Synthetic Datasets? Journal of Privacy and Confidentiality 2(2), 21–33 (2011)

    Google Scholar 

  3. Charest, A.-S.: Creation and Analysis of Differentially-Private Synthetic Datasets. PhD Thesis, Carnegie Mellon University (2012)

    Google Scholar 

  4. Christiansen, S., Giese, H.: Genetic Analysis of the Obligate Parasitic Barley Powdery Mildew Fungus Based on RFLP and Virulence Loci. TAG Theoretical and Applied Genetics 79(5), 705–712 (1990)

    Google Scholar 

  5. Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Edwards, D.: Introduction to Graphical Modelling. Springer (2000)

    Google Scholar 

  9. Edwards, D., Toma, H.: A Fast Procedure for Model Search in Multidimensional Contingency Tables. Biometrika 72(2), 339–351 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fienberg, S., Rinaldo, A., Yang, X.: Differential Privacy and the Risk-Utility Tradeoff for Multi-Dimensional Contingency Tables. In: Privacy in Statistical Databases, pp. 187–199 (2011)

    Google Scholar 

  11. Fienberg, S.E., Rinaldo, A.: Maximum Likelihood Estimation in Log-Linear Models: Theory and Algorithms. In: Annals of Statistics (to appear, 2012)

    Google Scholar 

  12. Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Arxiv preprint arXiv:1012.4763 (2010)

    Google Scholar 

  13. Hardt, M., Rothblum, G.: Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In: Proc. 51st Foundations of Computer Science (FOCS). IEEE (2010)

    Google Scholar 

  14. Kinney, S.: Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse. ProQuest (2007)

    Google Scholar 

  15. Kinney, S., Reiter, J., Berger, J.: Model Selection when Multiple Imputation Is Used to Protect Confidentiality in Public Use Data. Journal of Privacy and Confidentiality 2(2), 3–19 (2010)

    Google Scholar 

  16. Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 123–134 (2010)

    Google Scholar 

  17. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: Theory Meets Practice on the Map. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 277–286 (2008)

    Google Scholar 

  18. McSherry, F., Talwar, K.: Mechanism Design via Differential Privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103 (2007)

    Google Scholar 

  19. Reiter, J.: Inference for Partially Synthetic, Public Use Microdata Sets. Survey Methodology 29(2), 181–188 (2003)

    Google Scholar 

  20. Reiter, J.: Significance Tests for Multi-Component Estimands from Multiply Imputed, Synthetic Microdata. Journal of Statistical Planning and Inference 131(2), 365–377 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Rubin, D.B.: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–468 (1993)

    Google Scholar 

  22. Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Charest, AS. (2012). Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33627-0_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33626-3

  • Online ISBN: 978-3-642-33627-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics