Abstract
This work presents an unsupervised learning algorithm, using the mesh method for computing the log-likelihood function. The multinomial Dirichlet distribution (MDD) is one of the widely used methods of modeling multicategorical count data with overdispersion. Recently, it has been shown that traditional numerical computation of the MDD log-likelihood function either results in instability or leads to long run times that make its use infeasible in case of large datasets. Thus, we propose to use the mesh algorithm that involves approximating the MDD log-likelihood function based on Bernoulli polynomials. Moreover, we extend the mesh algorithm approach for computing the log-likelihood function of a more flexible distribution, namely the multinomial generalized Dirichlet (MGD). We demonstrate the efficiency of this method in statistical inference, i.e., maximum likelihood estimation, for fitting finite mixture models based on MDD and MGD as efficient distributions for count data. Through a set of experiments, the proposed approach shows its merits in two real-world clustering problems, namely natural scenes categorization and facial expression recognition.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, New York (2011)
Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)
Busam, R., Freitag, E.: Complex Analysis. Springer, London (2009)
Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach. Learn. 47(1), 7–34 (2002)
Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data, vol. 53. Cambridge University Press, Cambridge (2013)
Casella, G., Berger, R.: Duxbury advanced series in statistics and decision sciences. Statistical Inference (2002)
Church, K.W., Gale, W.A.: Poisson mixtures. Nat. Lang. Eng. 1(2), 163–190 (1995)
Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague vol. 1, pp. 1–2 (2004)
De Dinechin, F., Lauter, C.Q.: Optimizing polynomials for floating-point implementation (2008). Preprint. arXiv:0803.0439
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 524–531. IEEE, New York (2005)
Griffiths, D.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637–648 (1973)
Haseman, J., Kupper, L.: Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35(1), 281–293 (1979)
Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2011)
Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)
Leckenby, J.D., Kishi, S.: The Dirichlet multinomial distribution as a magazine exposure model. J. Market. Res. 21(1), 100–106 (1984)
Lewy, P.: A generalized Dirichlet distribution accounting for singularities of the variables. Biometrics 52(4), 1394–1409 (1996)
Lochner, R.H.: A generalized Dirichlet distribution in Bayesian life testing. J. R. Stat. Soc. Ser. B (Methodol.) 37(1), 103–113 (1975)
Loh, W.Y.: Symmetric multivariate and related distributions. Technometrics 34(2), 235–236 (1992)
Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of DARPA Broadcast News Workshop, pp. 127–131 (1999)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101. IEEE, New York (2010)
MacKay, D.J., Peto, L.C.B.: A hierarchical Dirichlet language model. Nat. Lang. Eng. 1(3), 289–308 (1995)
Madsen, R.E., Kauchak, D., Elkan, C.: Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545–552. ACM, New York (2005)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)
McLachlan, G., Peel., D.: Finite Mixture Models. Wiley, Hoboken (2000)
McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Annu. Rev. Stat. Appl. 6, 355–378 (2000)
Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with Dirichlet-multinomial regression (2012). Preprint. arXiv:1206.3278
Minka, T.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/~minka/papers/dirichlet
Mosimann, J.E.: On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49(1/2), 65–82 (1962)
Neerchal, N.K., Morel, J.G.: An improved method for the computation of maximum likelihood estimates for multinomial overdispersion models. Comput. Stat. Data Anal. 49(1), 33–43 (2005)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Poortema, K.: On modelling overdispersion of counts. Stat. Neerl. 53(1), 5–20 (1999)
Puig, P., Valero, J.: Count data distributions: some characterizations with applications. J. Am. Stat. Assoc. 101(473), 332–340 (2006)
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 616–623 (2003)
Rowe, C.H.: A proof of the asymptotic series for log γ (z) and log γ (z+ a). Ann. Math. 32(1), 10–16 (1931)
Rust, R.T., Leone, R.P.: The mixed-media Dirichlet multinomial distribution: a model for evaluating television-magazine advertising schedules. J. Mark. Res. 21(1), 89–99 (1984)
Teevan, J., Karger, D.R.: Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM, New York (2003)
Tirri, H., Kontkanen, P., Myllym Aki, P.: Probabilistic instance-based learning. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 507–515 (1996)
Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, pp. 737–744 (2003)
Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Intern. Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Paris, p. 65 (2010)
Whittaker, E., Watson, G.: A Course of Modern Analysis. Cambridge University Press, Cambridge (1990)
Wong, T.T.: Generalized Dirichlet distribution in Bayesian analysis. Appl. Math. Comput. 97(2–3), 165–181 (1998)
Wong, T.T.: Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min. Knowl. Disc. 18(2), 183–213 (2009)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492. IEEE, New York (2010)
Yu, P., Shaw, C.A.: An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics 30(11), 1547–1554 (2014)
Zamzami, N., Bouguila, N.: Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 31–34. IEEE, New York (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Daghyani, M., Zamzami, N., Bouguila, N. (2020). Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering. In: Bouguila, N., Fan, W. (eds) Mixture Models and Applications. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-23876-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-23876-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23875-9
Online ISBN: 978-3-030-23876-6
eBook Packages: EngineeringEngineering (R0)