Skip to main content

Algebraic Statistics and Contingency Table Problems: Log-Linear Models, Likelihood Estimation, and Disclosure Limitation

  • Chapter
  • First Online:
Emerging Applications of Algebraic Geometry

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 149))

Abstract

Contingency tables have provided a fertile ground for the growth of algebraic statistics. In this paper we briefly outline some features of this work and point to open research problems. We focus on the problem of maximum likelihood estimation for log-linear models and a related problem of disclosure limitation to protect the confidentiality of individual responses. Risk of disclosure has often been measured either formally or informally in terms of information contained in marginal tables linked to a log-linear model and has focused on the disclosure potential of small cell counts, especially those equal to 1 or 2. One way to assess the risk is to compute bounds for cell entries given a set of released marginals. Both of these methodologies become complicated for large sparse tables. This paper revisits the problem of computing bounds for cell entries and picks up on a theme first suggested in Fienberg [21] that there is an intimate link between the ideas on bounds and the existence of maximum likelihood estimates, and shows how these ideas can be made rigorous through the underlying mathematics of the same geometric/algebraic framework. We illustrate the linkages through a series of examples. We also discuss the more complex problem of releasing marginal and conditional information. We illustrate the statistical features of the methodology on two examples and then conclude with a series of open problems.

AMS(MOS) subject classifications. 13P10, 62805, 62H17, 62P25.

Supported in part by NSF grants E1A9876619 and 11S0131884 to the National Institute of Statistical Sciences, and NSF Grant DMS-0631589 and Army contract DAAD19-02-1-3-0389 to Carnegie Mellon University.

Supported in part by NSF Grant DMS-0631589 and a grant from the Pennsylvania Department of Health through the Commonwealth Universal Research Enhancement Program to Carnegie Mellon University.

Supported in part by NSF grants EIA9876619 and 11S0131884 to the National Institute of Statistical Sciences and SES-0532407 to Pennsylvania State University.

Supported by Army contract DAAD19-02-1-3-0389 to Carnegie Mellon University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y.M.M. BISHOP, S.E. FIENBERG, AND P.W. HOLLAND (1975). Discrete Multivariate Analysis: Theory and Practice, MIT Press, Cambridge, MA. Reprinted (2007), Springer-Verlag, New York.

    Google Scholar 

  2. L. BUZZIGOLI AND A. GIUSTI (1999). An algorithm to calculate the lower and upper bounds of the elements of an array given its marginals, in Proceedings of the Conference on Statistical Data Protection. Luxemburg: Eurostat, pp. 131-147.

    Google Scholar 

  3. E. CARLINI AND F. RAPALLO (2005). ThegeometryofstaUstical models for two-way contingency tables with fixed odds ratios, Rendiconti dell'Istituto di Matematica dell'Universita di Trieste, 37:71~84.

    Google Scholar 

  4. S.K. CHRISTIANSEN AND H. GIESE (19991). Genetic analysis of obligate barley powdery mildew fungus based on RFPL and virulence loci, Theoretical and Applied Genetics, 79:705-712.

    Google Scholar 

  5. Y. CHEN, I.R. DINWOODIE, AND S. SULLIVANT (2006). Sequential importance sampling for multiway tables, Annals of Statistics, 34:523-545.

    Google Scholar 

  6. L.R. Cox (2002). Bounds on entries in 3-dimensional contingency tables subject to given marginal totals, In J. Domingo-Ferrer (Ed.), Inference Oontrol in Statistical Databases, Springer-Verlag LNCS 2316, pp. 21-33.

    Google Scholar 

  7. L.R. Cox (2003). On properties of multi-dimensional statistical tables, Journal of Statistical Planning and Inference, 117:251-273.

    Article  MathSciNet  MATH  Google Scholar 

  8. J.A. DE LOERA, R. HEMMECKE, J. TAUZER, AND R. YOSHIDA (2004). Effective lattice point counting in rational convex polytopes, Journal of Symbolic Computation, 38:1273-1302.

    Google Scholar 

  9. J.A. DE LOERA AND S. ONN (2006). Markov bases of 3-way tables are erbitrerily' complicated, Journal of Symbolic Computation, 41:173-181.

    Google Scholar 

  10. [101 P. DIACONIS AND B. STURMFELS (1998). Algebraic algorUbms for sampling from conditional distribution, Annals of Statistics, 26:363-397.

    Google Scholar 

  11. [11J A. DOBRA (2002). Statistical Tools for Disclosure Limitation in Multi~way Contingency Tables. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.

    Google Scholar 

  12. [12J A. DOBRA (2003). Markov bases for decomposable graphical models, Bernoulli, 9(6):1-16.

    Google Scholar 

  13. A. DOBRA AND S.E. FIENBERG (2000). Bounds for cell entries iti contingency tables given marginal totals and decomposable graphs, Proceedings of the National Academy of Sciences, 97:11885-11892.

    Google Scholar 

  14. A. DOBRAAND S.E. FIENBERG (2001). "Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation," Statistical Journal of the United Nations ECE, 18:363-37l.

    Google Scholar 

  15. A. DOBRAAND S.E. FIENBERG (2003). Bounding entries in multi-way contingency tables given a set of marginal totals, in Y. Haitovsky, H.R. Lerche, and Y. Ritov, eds., Foundations of Statistical Inference: Proceedings of the Shoresh Conference 2000, Physica-Verlag, pp. 3-16.

    Google Scholar 

  16. A. DOBRA AND S.E. FIENBERG (2008). The generalized shuttle algorithm, in P. Gibilisco, Eva Riccomagno, Maria-Piera Rogantin (eds.) Algebraic and Geometric Methods in Probability and Statistics, Cambridge University Press, to appear.

    Google Scholar 

  17. [17) A. DOBRA, S.E. FIENBERG, AND M. TROTTINI (2003). Assessing the risk of disclosure of confidential categorical data, in J. Bernardo et al., eds., Bayesian StatisUcs 7, Oxford University Press, pp. 125-144.

    Google Scholar 

  18. [18J P. DOYLE, J. LANE, J. THEEUWES, AND L. ZAYATZ (eds.) (2001). Confidentiality, Disclosure and Data Access: Theory and Practical Applications for StatistJ'cal Agencies. Elsevier.

    Google Scholar 

  19. [191 D. EDWARDS (1992). Linkage analysis using log-lineet models, Computational Statistics and Data Analysis, 10:281-290.

    Google Scholar 

  20. N. ERIKSSON, S.E. FIENBERG, A. RJNALDO, AND S. SULLIVANT (2006). Polyhedral conditions for the non-existence of the MLE for hierarchical log-linear models, Journal of Symbolic Computation, 41:222-233.

    Google Scholar 

  21. S.E. FIENBERG (1999). Frechet and Bonferroni bounds for multi-way tables of counts With applications to disclosure limitation, In Statistical Data Protection, Proceedings of the Conference, Lisbon, Eurostat, pp. 115-13l.

    Google Scholar 

  22. S.E. FIENBERG, U.E. MAKOV, M.M. MEYER, AND R.J. STEELE (2001). "Computing the exact distribution for a multi-way contingency table conditional on its marginal totals," in A.K.M.E. Saleh, ed., Data Analysis from Statistica Foundations: A Festschrift in Honor of the 75th Birthday of D. A. S. Fraser, Nova Science Publishers, Huntington, NY, pp. 145-165.

    Google Scholar 

  23. S.E. FIENBERG AND A. RINALDO (2006). Computing maximum likelihood estimates in log-linear models, Technical Report 835, Department of Statistics, Carnegie Mellon University.

    Google Scholar 

  24. S.E. FIENBERG AND A. RINALDO (2007). Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation, Journal of Statistical Planning and Inference, 137:3430-3445.

    Google Scholar 

  25. S.E. FIENBERG AND A.B. SLAVKOVIC (2004a). Making the release of confidential data from multi-way tables count, Chance, 17(3):5-10.

    Google Scholar 

  26. S.E FIENBERG AND A.S. SLAVKOVIC (2005). Preserving the confidentiality of categorical databases when releasing information for association rules, Data Mining and Knowledge Discovery, 11:155-180.

    Google Scholar 

  27. L. GARCIA, M. STILLMAN, AND B. STURMFELS (2005). Algebraic geometry for Bayesian networks, Journal of Symbolic Computation, 39:331-355.

    Google Scholar 

  28. E. GAWRILOW AND M. JOSWIG (2005). Geometric reasoning with polymake, Manuscript available at arXiv:math. CO/0507273.

    Google Scholar 

  29. D. GEIGER, C. MEEK, AND B. STURMFELS (2006). On the toric algebra ofgraphical models, Annals of Statistics, 34:1463-1492.

    Google Scholar 

  30. S.J. HABERMAN (1974). The Analysis of Frequency Data, University of Chicago Press, Chicago, Illinois.

    Google Scholar 

  31. S. HO~TEN AND B. STURMFELS (2006). Computing the integer programming gap, Combinatorica, 27:367-382.

    Google Scholar 

  32. S.L. LAURJTZEN (1996). Graphical Models, Oxford University Press, New York. [33] R.B. NELSEN (2006). An Introduction to Copulas. Springer-Verlag, New York.

    Google Scholar 

  33. A. RINALDO (2005). Maximum Likelihood Estimation for Log-linear Models. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.

    Google Scholar 

  34. A. RINALDO (2006). On maximum likelihood estimation for log-linear models, submitted for publication.

    Google Scholar 

  35. F. SANTOS AND B. STURMFELS (2003). Higher Lawrence configurations, J. Cornbin. Theory Ser. A, 103:151-164.

    Google Scholar 

  36. A.B. SLAVKOVIC (2004). Statistical Disclosure Limitation Beyond the Margins: Characterization of Joint Distributions for Contingency Tables. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.

    Google Scholar 

  37. A.B. SLAVKOVIC AND B. SMUCKER (2007). Calculating Cell Bounds in Contingency Tables Based on Conditional Frequencies, Technical Report, Department of Statistics, Penn State University.

    Google Scholar 

  38. A.B. SLAVKOVIC AND FIENBERG, S. E. (2004). Bounds for Cell Entries in Two-way Tables Given Conditional Relative Frequencies, In Domingo-Ferrer, J. and Terra, V. (eds.), Privacy in Statistical Databases, Lecture Notes in Computer Science No. 3050, pp. 30-43. New York: Springer-Verlag.

    Google Scholar 

  39. A.B. SLAVKOVIC AND S.E. FIENBERG (2008). The algebraic geometry of 2 x 2 contingency tables, forthcoming.

    Google Scholar 

  40. B. STURMFELS (1995). Grebner Bases and Convex Polytope, American Mathematical Society, University Lecture Series, 8.

    Google Scholar 

  41. S. SULL1VANT (2006). Compressed polytopes and statistical disclosure limitation, Tohoku Mathematical Journal, 58(3):433-445.

    Google Scholar 

  42. S. SULLIVANT (2005). Small contingency tables with large gaps, SIAM Journal of Discrete Mathematics, 18(4):787-793.

    Google Scholar 

  43. G.M. ZIEGLER (1998). Lectures on Polytopes, Springer-Verlag, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian Dobra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Dobra, A., Fienberg, S.E., Rinaldo, A., Slavkovic, A., Zhou, Y. (2009). Algebraic Statistics and Contingency Table Problems: Log-Linear Models, Likelihood Estimation, and Disclosure Limitation. In: Putinar, M., Sullivant, S. (eds) Emerging Applications of Algebraic Geometry. The IMA Volumes in Mathematics and its Applications, vol 149. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09686-5_3

Download citation

Publish with us

Policies and ethics