Skip to main content

Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering

  • Chapter
  • First Online:

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

This work presents an unsupervised learning algorithm, using the mesh method for computing the log-likelihood function. The multinomial Dirichlet distribution (MDD) is one of the widely used methods of modeling multicategorical count data with overdispersion. Recently, it has been shown that traditional numerical computation of the MDD log-likelihood function either results in instability or leads to long run times that make its use infeasible in case of large datasets. Thus, we propose to use the mesh algorithm that involves approximating the MDD log-likelihood function based on Bernoulli polynomials. Moreover, we extend the mesh algorithm approach for computing the log-likelihood function of a more flexible distribution, namely the multinomial generalized Dirichlet (MGD). We demonstrate the efficiency of this method in statistical inference, i.e., maximum likelihood estimation, for fitting finite mixture models based on MDD and MGD as efficient distributions for count data. Through a set of experiments, the proposed approach shows its merits in two real-world clustering problems, namely natural scenes categorization and facial expression recognition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://groups.csail.mit.edu/vision/SUN/.

References

  1. Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, New York (2011)

    MATH  Google Scholar 

  2. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)

    Article  Google Scholar 

  3. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)

    Article  Google Scholar 

  5. Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)

    Article  Google Scholar 

  6. Busam, R., Freitag, E.: Complex Analysis. Springer, London (2009)

    Book  MATH  Google Scholar 

  7. Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach. Learn. 47(1), 7–34 (2002)

    Article  MATH  Google Scholar 

  8. Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data, vol. 53. Cambridge University Press, Cambridge (2013)

    Book  MATH  Google Scholar 

  9. Casella, G., Berger, R.: Duxbury advanced series in statistics and decision sciences. Statistical Inference (2002)

    Google Scholar 

  10. Church, K.W., Gale, W.A.: Poisson mixtures. Nat. Lang. Eng. 1(2), 163–190 (1995)

    Article  MathSciNet  Google Scholar 

  11. Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  12. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague vol. 1, pp. 1–2 (2004)

    Google Scholar 

  13. De Dinechin, F., Lauter, C.Q.: Optimizing polynomials for floating-point implementation (2008). Preprint. arXiv:0803.0439

    Google Scholar 

  14. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  15. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 524–531. IEEE, New York (2005)

    Google Scholar 

  16. Griffiths, D.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637–648 (1973)

    Article  Google Scholar 

  17. Haseman, J., Kupper, L.: Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35(1), 281–293 (1979)

    Article  Google Scholar 

  18. Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2011)

    Book  MATH  Google Scholar 

  19. Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)

    Article  Google Scholar 

  20. Leckenby, J.D., Kishi, S.: The Dirichlet multinomial distribution as a magazine exposure model. J. Market. Res. 21(1), 100–106 (1984)

    Article  Google Scholar 

  21. Lewy, P.: A generalized Dirichlet distribution accounting for singularities of the variables. Biometrics 52(4), 1394–1409 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  22. Lochner, R.H.: A generalized Dirichlet distribution in Bayesian life testing. J. R. Stat. Soc. Ser. B (Methodol.) 37(1), 103–113 (1975)

    MathSciNet  MATH  Google Scholar 

  23. Loh, W.Y.: Symmetric multivariate and related distributions. Technometrics 34(2), 235–236 (1992)

    Article  Google Scholar 

  24. Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of DARPA Broadcast News Workshop, pp. 127–131 (1999)

    Google Scholar 

  25. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  26. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101. IEEE, New York (2010)

    Google Scholar 

  27. MacKay, D.J., Peto, L.C.B.: A hierarchical Dirichlet language model. Nat. Lang. Eng. 1(3), 289–308 (1995)

    Article  Google Scholar 

  28. Madsen, R.E., Kauchak, D., Elkan, C.: Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545–552. ACM, New York (2005)

    Google Scholar 

  29. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)

    MATH  Google Scholar 

  30. McLachlan, G., Peel., D.: Finite Mixture Models. Wiley, Hoboken (2000)

    Book  MATH  Google Scholar 

  31. McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Annu. Rev. Stat. Appl. 6, 355–378 (2000)

    Article  MathSciNet  Google Scholar 

  32. Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with Dirichlet-multinomial regression (2012). Preprint. arXiv:1206.3278

    Google Scholar 

  33. Minka, T.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/~minka/papers/dirichlet

  34. Mosimann, J.E.: On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49(1/2), 65–82 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  35. Neerchal, N.K., Morel, J.G.: An improved method for the computation of maximum likelihood estimates for multinomial overdispersion models. Comput. Stat. Data Anal. 49(1), 33–43 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  MATH  Google Scholar 

  37. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  38. Poortema, K.: On modelling overdispersion of counts. Stat. Neerl. 53(1), 5–20 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  39. Puig, P., Valero, J.: Count data distributions: some characterizations with applications. J. Am. Stat. Assoc. 101(473), 332–340 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  40. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  41. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 616–623 (2003)

    Google Scholar 

  42. Rowe, C.H.: A proof of the asymptotic series for log γ (z) and log γ (z+ a). Ann. Math. 32(1), 10–16 (1931)

    Article  MathSciNet  Google Scholar 

  43. Rust, R.T., Leone, R.P.: The mixed-media Dirichlet multinomial distribution: a model for evaluating television-magazine advertising schedules. J. Mark. Res. 21(1), 89–99 (1984)

    Article  Google Scholar 

  44. Teevan, J., Karger, D.R.: Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM, New York (2003)

    Google Scholar 

  45. Tirri, H., Kontkanen, P., Myllym Aki, P.: Probabilistic instance-based learning. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 507–515 (1996)

    Google Scholar 

  46. Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, pp. 737–744 (2003)

    Google Scholar 

  47. Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Intern. Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Paris, p. 65 (2010)

    Google Scholar 

  48. Whittaker, E., Watson, G.: A Course of Modern Analysis. Cambridge University Press, Cambridge (1990)

    MATH  Google Scholar 

  49. Wong, T.T.: Generalized Dirichlet distribution in Bayesian analysis. Appl. Math. Comput. 97(2–3), 165–181 (1998)

    MathSciNet  MATH  Google Scholar 

  50. Wong, T.T.: Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min. Knowl. Disc. 18(2), 183–213 (2009)

    Article  Google Scholar 

  51. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492. IEEE, New York (2010)

    Google Scholar 

  52. Yu, P., Shaw, C.A.: An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics 30(11), 1547–1554 (2014)

    Article  Google Scholar 

  53. Zamzami, N., Bouguila, N.: Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 31–34. IEEE, New York (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masoud Daghyani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Daghyani, M., Zamzami, N., Bouguila, N. (2020). Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering. In: Bouguila, N., Fan, W. (eds) Mixture Models and Applications. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-23876-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23876-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23875-9

  • Online ISBN: 978-3-030-23876-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics