Fast progressive training of mixture models for model selection

Adhikari, Prem Raj; Hollmén, Jaakko

doi:10.1007/s10844-013-0282-3

Fast progressive training of mixture models for model selection

Published: 01 December 2013

Volume 44, pages 223–241, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Prem Raj Adhikari¹ &
Jaakko Hollmén¹

276 Accesses
1 Citation
Explore all metrics

Abstract

Finite mixture models (FMM) are flexible models with varying uses such as density estimation, clustering, classification, modeling heterogeneity, model averaging, and handling missing data. Expectation maximization (EM) algorithm can learn the maximum likelihood estimates for the model parameters. One of the prerequisites for using the EM algorithm is the a priori knowledge of the number of mixture components in the mixture model. However, the number of mixing components is often unknown. Therefore, determining the number of mixture components has been a central problem in mixture modelling. Thus, mixture modelling is often a two-stage process of determining the number of mixture components and then estimating the parameters of the mixture model. This paper proposes a fast training of a series of mixture models using progressive merging of mixture components to facilitate model selection algorithm to make appropriate choice of the model. The paper also proposes a data driven, fast approximation of the Kullback–Leibler (KL) divergence as a criterion to measure the similarity of the mixture components. We use the proposed methodology in mixture modelling of a synthetic dataset, a publicly available zoo dataset, and two chromosomal aberration datasets showing that model selection is efficient and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Article 04 March 2020

Recent Developments in Model-Based Clustering with Applications

An effective strategy for initializing the EM algorithm in finite mixture models

Article 03 August 2016

References

Adhikari, P.R., & Hollmén, J. (2010a). Patterns from multi-resolution 0–1 data. In B. Goethals, N. Tatti, J. Vreeken (Eds.) Proceedings of the ACM SIGKDD workshop on useful patterns (UP’10) (pp. 8–12). ACM.
Adhikari, P.R., & Hollmén, J. (2010b). Preservation of statistically significant patterns in multiresolution 0–1 data. In T. Dijkstra, E. Tsivtsivadze, E. Marchiori, T. Heskes (Eds.) Pattern recognition in bioinformatics. Lecture notes in computer science (Vol. 6282, pp. 86–97). Berlin/Heidelberg: Springer.
Chapter Google Scholar
Adhikari, P.R., & Hollmén, J. (2012). Fast progressive training of mixture models for model selection. In J.-G. Ganascia, P. Lenca, J.-M. Petit (Eds.) Proceedings of fifteenth international conference on discovery science (DS 2012). LNAI (Vol. 7569, pp. 194–208). Springer-Verlag.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Article MATH MathSciNet Google Scholar
Bache, K., & Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Science. http://archive.ics.uci.edu/ml.
Baudis, M. (2007). Genomic imbalances in 5918 malignant epithelial tumors: an explorative meta-analysis of chromosomal CGH data. BMC Cancer, 7, 226.
Article Google Scholar
Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the ACM KDD ’00, New York, USA (pp. 407–416).
Blekas, K., & Lagaris, I.E. (2007). Split-merge incremental learning (SMILE) of mixture models. In Proceedings of the ICANN’07 (pp. 291–300). Springer-Verlag.
Cai, H., Kulkarni, S.R., Verdú, S. (2006). Universal divergence estimation for finite-alphabet sources. IEEE Transactions on Information Theory, 52(8), 3456–3475.
Article MATH Google Scholar
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. New York: Wiley-Interscience.
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society, Series B, 39(1), 1–38.
MATH MathSciNet Google Scholar
Donoho, D.L. (2000) High-dimensional data analysis: the curses and blessings of dimensionality. Aide–Memoire of a lecture. In AMS conference on math challenges of the 21st century.
Everitt, B.S., & Hand, D.J. (1981). Finite mixture distributions. London, New York: Chapman and Hall.
Book MATH Google Scholar
Figueiredo, M.A.T, & Jain, A.K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis Machicne Intelligence, 24(3), 381–396.
Article Google Scholar
Goldberger, J., Gordon, S., Greenspan, H. (2003). An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In Proceedings of the ICCV ’03, Washington DC, USA (pp. 487–493).
Hershey, J.R., & Olsen, P.A. (2007). Approximating the Kullback Leibler divergence between Gaussian mixture models. In IEEE. ICASSP 2007 (Vol. 4, pp. 317–320).
Juang, B.H., & Rabiner, L.R. (1985). A probabilistic distance measure for Hidden Markov models. AT&T Technical Journal, 64(2), 391–408.
Article MathSciNet Google Scholar
Hollmén, J., & Tikka, J. (2007). Compact and understandable descriptions of mixture of Bernoulli distributions. In M.R. Berthold, J. Shawe-Taylor, N. Lavrač (Eds.) Proceedings of the IDA 2007. LNCS (Vol. 4723, pp. 1–12).
Kittler, J. (1986). Feature selection and extraction. Handbook of pattern recognition and image processing.. Academic Press.
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.
Article MATH MathSciNet Google Scholar
Lee, Y.K., & Park, B.U. (2006). Estimation of Kullback–Leibler divergence by local likelihood. Annals of the Institute of Statistical Mathematics, 58, 327–340.
Article MATH MathSciNet Google Scholar
Leonenko, N., Pronzato, L., Savani, V. (2008). A class of Rényi information estimators for multidimensional densities. Annals of Statistics, 36(5), 2153–2182.
Article MATH MathSciNet Google Scholar
Li, T. (2005). A general model for clustering binary data. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05 (pp. 188–197). ACM: New York.
Chapter Google Scholar
Li, Y., & Li, L. (2009). A novel split and merge EM algorithm for gaussian mixture model. In Fifth international conference on natural computation, 2009. ICNC ’09 (Vol. 6, pp. 479–483).
Li, Y., & Li, L. (2009). A split and merge EM algorithm for color image segmentation. In IEEE ICIS 2009 (Vol. 4, pp. 395–399).
Mclachlan, G.J., & Krishnan, T. (1996). The EM algorithm and extensions (1st ed.). Wiley-Interscience.
McLachlan, G.J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book MATH Google Scholar
Myllykangas, S., Tikka, J., Böhling, T., Knuutila, S., Hollmén, J. (2008). Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics, 1(15), 1–18.
Google Scholar
Perez-Cruz, F. (2008). Kullback–Leibler divergence estimation of continuous distributions. In IEEE international symposium on information theory, ISIT 2008 (pp. 1666–1670).
Smyth, P. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 10, 63–72.
Article Google Scholar
Tikka, J., & Hollmén, J. (2008). A sequential input selection algorithm for long-term prediction of time series. Neurocomputing, 71(13–15), 2604–2615.
Article Google Scholar
Tikka, J., Hollmén, J., Myllykangas, S. (2007). Mixture modeling of DNA copy number amplification patterns in cancer. In F. Sandoval, A. Prieto, J. Cabestany, M. Graña (Eds.) Proceedings of the IWANN 2007. Lecture notes in computer science (Vol. 4507, pp. 972–979). San Sebastián, Spain: Springer-Verlag.
Google Scholar
Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.E. (2000). SMEM algorithm for mixture models. Neural Computation, 12(9), 2109–2128.
Article Google Scholar
Wang, Q., Kulkarni, S.R., Verdú, S. (2005). Universal estimation of divergence for continuous distributions via data-dependent partitions. In Proceedings international symposium on information theory, ISIT 2005 (pp. 152–156).
Windham, M.P., & Cutler, A. (1992). Information ratios for validating mixture analyses. Journal of the American Statistical Association, 87(420), 1188–1192.
Article Google Scholar
Wolfe, J.H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–350.
Article Google Scholar
Zhang, B., Zhang, C., Yi, X. (2004). Competitive EM algorithm for finite mixture models. Pattern Recognition, 37(1), 131–144.
Article MATH Google Scholar
Zhang, Z., Chen, C., Sun, J., Chan, K.L. (2003). EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognition, 36(9), 1973–1983.
Article MATH Google Scholar

Download references

Acknowledgements

Helsinki Doctoral Programme in Computer Science—Advanced Computing and Intelligent Systems (Hecse), and Finnish Center of Excellence for Algorithmic Data Analysis (ALGODAN) funds the current research.

Author information

Authors and Affiliations

Helsinki Institute for Information Technology (HIIT), Department of Information and Computer Science (ICS), Aalto University School of Science, PO Box 15400, 00076, Aalto, Espoo, Finland
Prem Raj Adhikari & Jaakko Hollmén

Authors

Prem Raj Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prem Raj Adhikari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adhikari, P.R., Hollmén, J. Fast progressive training of mixture models for model selection. J Intell Inf Syst 44, 223–241 (2015). https://doi.org/10.1007/s10844-013-0282-3

Download citation

Received: 11 January 2013
Revised: 15 September 2013
Accepted: 04 October 2013
Published: 01 December 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10844-013-0282-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast progressive training of mixture models for model selection

Abstract

Access this article

Similar content being viewed by others

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Recent Developments in Model-Based Clustering with Applications

An effective strategy for initializing the EM algorithm in finite mixture models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast progressive training of mixture models for model selection

Abstract

Access this article

Similar content being viewed by others

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Recent Developments in Model-Based Clustering with Applications

An effective strategy for initializing the EM algorithm in finite mixture models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation