Skip to main content

A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 34))

In the statistical literature, there has been considerable development of methods of data releases for multivariate categorical data sets, where the releases come in the form of marginal and conditional tables corresponding to subsets of the categorical variables. In this chapter we provide an overview of this methodology and we relate it to the literature on the release of association rules which can be viewed as conditional tables. We illustrate this with two examples. A related problem, ”association rule hiding” is often independently studied in the database community.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agresti, A. (2002). Categorical Data Analysis. 2nd Edition. New York: Wiley.

    MATH  Google Scholar 

  2. Anderson, B. and Moore, A. (1998). AD-trees for Fast Counting and for Fast Learning of Association Rules, Knowledge Discovery from Databases Conference.

    Google Scholar 

  3. Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, M., and Talwar, K. (2007). Privacy, accuracy, and consistency too: a holistic solution to contingency table release, PODS ’07: Proceedings of 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York: ACM Press, 273–282.

    Chapter  Google Scholar 

  4. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  5. De Loera, J., Haws, D., Hemmecke, R., Huggins, P., Tauzer, J., and Yoshida, R. (2003). A User’s Guide for LattE v1.1. University of California, Davis.

    Google Scholar 

  6. Diaconis, P. and Sturmfels, B. (1998). Algebraic Algorithms for Sampling From Conditional Distributions, Annals of Statistics, 26, 363–397.

    Article  MATH  MathSciNet  Google Scholar 

  7. Dobra, A. and Fienberg, S. E. (2000). Bounds for Cell Entries in Contingency Tables Given Marginal Totals and Decomposable Graphs, Proceedings of the National Academy of Sciences, 97, 11885–11892.

    Article  MATH  MathSciNet  Google Scholar 

  8. Dobra, A. and Fienberg, S. E. (2001). Bounds for Cell Entries in Contingency Tables Induced by Fixed Marginal Totals, Statistical Journal of the United Nations ECE, 18, 363–371.

    Google Scholar 

  9. Dobra, A., Fienberg, S. E., and Trottini, M. (2003). Assessing the Risk of Disclosure of Confidential Categorical Data (with discussion), In J. Bernardo et al. eds., Bayesian Statistics 7, Clarendon: Oxford University Press, 125–144.

    Google Scholar 

  10. Domingo-Ferrer, J. and Torra, V. (eds.) (2004). Privacy in Statistical Databases, Lecture Notes in Computer Science No. 3050, New York: Springer-Verlag.

    Google Scholar 

  11. DuMouchel, W. and Pregibon, D. (2001). Empirical Bayes Screening for Multi-Item Associations, Proceedings of the ACM SIGKDD Intentional Conference on Knowledge Discovery in Databases & Data Mining (KDD01), ACM Press, 67–76.

    Google Scholar 

  12. Duncan, G. T., Fienberg, S. E., Krishnan, R., Padman, R., and Roehrig, S. F. (2001). Disclosure Limitation Methods and Information Loss for Tabular Data, In P. Doyle, J. Lane, J. Theeuwes, and L. Zayatz (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Amsterdam: Elsevier, 135–166.

    Google Scholar 

  13. Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006). Calibrating Noise to Sensitivity of Functions in Private Data Analysis, 3rd Theory of Cryptography Conference (TCC) 2006, 265–284.

    Google Scholar 

  14. Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. (2002). Privacy Preserving Mining of Association Rules, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Canada, July 2002.

    Google Scholar 

  15. Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data. 2nd edition. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  16. Fienberg, S. E. (2004). Datamining and Disclosure Limitation for Categorical Statistical Databases, Proceedings of Workshop on Privacy and Security Aspects of Data Mining, Fourth IEEE International Conference on Data Mining (ICDM 2004), Brighton, UK, November 2004.

    Google Scholar 

  17. Fienberg, S. E., Makov, U. E., Meyer, M. M., and Steele, R. J. (2001). Computing the Exact Distribution for a Multi-way Contingency Table Conditional on its Marginals Totals, In A. K. M. E. Saleh, ed. Data Analysis from Statistical Foundations: Papers in Honor of D. A. S. Fraser, Huntington, NY: Nova Science Publishing, 145–165.

    Google Scholar 

  18. Fienberg, S. E. and Makov, U. E. (1998). Confidentiality, Uniqueness, and Disclosure Limitation for Categorical Data, Journal of Official Statistics, 14, 385–397.

    Google Scholar 

  19. Fienberg, S. E. and Slavkovic, A. B. (2004). Making the Release of Confidential Data from Multi-Way Tables Count, Chance, 17(3), 5–10.

    MathSciNet  Google Scholar 

  20. Fienberg, S. E. and Slavkovic, A. B. (2005). Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules, Data Mining and Knowledge Discovery. 11, 155–180.

    Article  MathSciNet  Google Scholar 

  21. Hemmecke, R. and Hemmecke, R. (2003). 4ti2 Version 1.1—Computation of Hilbert bases, Graver bases, toric Gröbner bases, and more. http://www.4ti2.de .

  22. Jordan, M. I. (ed.) (1998). Learning in Graphical Models. Cambridge MA: MIT Press.

    MATH  Google Scholar 

  23. Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. (2003). Random Data Perturbation Techniques and Privacy Preserving Data Mining, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), Melbourn, Florida, USA, December 2003.

    Google Scholar 

  24. Koch, G., Amara, J., Atkinson, S., and Stanish, W. (1983). Overview of categorical analysis methods, SAS-SUGI, 8, 785–795.

    Google Scholar 

  25. Lauritzen, S. L. (1996). Graphical Models. Oxford: Oxford University Press.

    Google Scholar 

  26. Madigan, D. and Raftery, A. E. (1994). Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occams Window, Journal of the American Statistical Association, 89: 1535–1546.

    Article  MATH  Google Scholar 

  27. Moore, A. and Schneider, J. (2002). Real-valued All-Dimensions Search: Low-overhead Rapid Searching Over Subsets of Attributes, Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, July, 2002, San Francisco: Morgan Kaufmann Publishers, 360–369.

    Google Scholar 

  28. Rizvi, S. and Haritsa, J. (2002). Maintaining Data Privacy in Association Rule Mining, Proceedings of the 28th Conference on Very Large Data Base (VLDB’02).

    Google Scholar 

  29. Silverstein, C., Brin, S., and Motwani, R. (1998). Beyond Market Baskets: Generalizing Association Rules to Dependence Rules, Data Mining and Knowledge Discovery, 2,39–68.

    Article  Google Scholar 

  30. Silverstein, C., Brin, S., Motwani, R. and Ullman, J. (2000). Scalable Techniques for Mining Causal Structures, Data Mining and Knowledge Discovery, 4, 163–192.

    Article  Google Scholar 

  31. Slavkovic, A. B. (2004). Statistical Disclosure Limitation Beyond the Margins. Ph.D. Thesis, Department of Statistics, Carnegie Mellon University.

    Google Scholar 

  32. Slavkovic, A. B. and Smucker, B. (2007). Calculating Cell Bounds in Contingency Tables Based on Conditional Frequencies. Technical Report, Department of Statistics, Penn State University.

    Google Scholar 

  33. Slavkovic, A. B. and Fienberg, S. E. (2004). Bounds for Cell Entries in Two-way Tables Given Conditional Relative Frequencies, In Domingo-Ferrer, J. and Torra, V. (eds.), Privacy in Statistical Databases, Lecture Notes in Computer Science No. 3050, 30–43. New York: Springer-Verlag.

    Google Scholar 

  34. Sturmfels, B. (2003). Algebra and Geometery of Statistical Models. John von Neumann Lectures at Munich University.

    Google Scholar 

  35. Trottini, M. and Fienberg, S. E. (2002). Modelling User Uncertainty for Disclosure Risk and Data Utility, International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10, 511–528.

    Article  MATH  Google Scholar 

  36. Willenborg, L. C. R. J. and de Waal, T. (2000). Elements of Statistical Disclosure Control. Lecture Notes in Statistics, Volume 155, New York: Springer-Verlag.

    Google Scholar 

  37. Wu, X., Barbará, D. and Ye, Y. (2003). Screening and Interpreting Multi-item Associations Based on Log-linear modeling, Proceedings of the ACM SIGKDD Intentional Conference on Knowledge Discovery in Databases & Data Mining (KDD03), ACM Press, 276–285.

    Google Scholar 

  38. Zaki M. J. (2004). Mining Non-Redundant Association Rules,Data Mining and Knowledge Discovery, 9, 223–248.

    Article  MathSciNet  Google Scholar 

  39. Verykios S. Vassilios and Gkoulalas-Divani A.(2007) A Survey of Association Rule Hiding Methods for Privacy, in this volume .

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Fienberg, S.E., Slavkovic, A.B. (2008). A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-70992-5_12

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-70991-8

  • Online ISBN: 978-0-387-70992-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics