Fast and Efficient Model-Based Clustering with the Ascent-EM Algorithm

Jank, Wolfgang

doi:10.1007/0-387-23529-9_14

Wolfgang Jank⁵

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 29))

896 Accesses
1 Citations

Abstract

In this paper we propose an efficient and fast EM algorithm for model-based clustering of large databases. Drawing ideas from its stochastic descendant, the Monte Carlo EM algorithm, the method uses only a sub-sample of the entire database per iteration. Starting with smaller samples in the earlier iterations for computational efficiency, the algorithm increase the sample size intelligently towards the end of the algorithm to assure maximum accuracy of the results. The intelligent sample size updating rule is centered around EM’s highly-appraised likelihood-ascent property and only increases the sample when no further improvements are possible based on the current sample. In several simulation studies we show the superiority of Ascent-EM over regular EM implementations. We apply the method to an example of clustering online auctions.

This research was partially funded by the NSF grant DMI-0205489

Download to read the full chapter text

Chapter PDF

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Article 13 April 2021

Optimal Bayesian estimators for latent variable cluster models

Article Open access 31 October 2017

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Article 05 August 2015

Keywords

References

Booth, James G. and Hobert, James P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society B,61:265–285.
Article MATH Google Scholar
Booth, James G, Hobert, James P, and Jank, Wolfgang (2001). A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling, 1:333–349.
Article MATH Google Scholar
Boyles, Russell A. (1983). On the convergence of the EM algorithm. Journal of the Royal Statistical Society B, 45:47–50.
MATH MathSciNet Google Scholar
Caffo, Brian S, Jank, Wolfgang S, and Jones, Galin L (2004). Ascent-Based Monte Carlo EM. Journal of the Royal Statistical Society, Series B, Forthcoming.
Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–22.
MATH MathSciNet Google Scholar
Levine, RA and Casella, G (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10:422–439.
Article MathSciNet Google Scholar
Levine, RA and Fan, J (2003). An automated (Markov Chain) Monte Carlo EM algorithm. Journal of Statistical Computation and Simulation (forthcoming).
Google Scholar
Meng, Xiao-Li (1994). On the rate of convergence of the ECM algorithm. The Annals of Statistics, 22:326–339.
MATH MathSciNet Google Scholar
Meng, Xiao-Li and Rubin, Donald B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80:267–278.
Article MATH MathSciNet Google Scholar
Ng, Shu-Kay and McLachlan, Geoffrey J (2003). On some variants of the EM Algorithm for fitting finite mixture models. Australian Journal of Statistics, 32:143–161.
Google Scholar
Rubin, Donald B. (1991). EM and beyond. Psychometrika, 56:241–254.
Article MathSciNet Google Scholar
Wei, Greg C. G. and Tanner, Martin A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85:699–704.
Article Google Scholar
Wu, C. F. Jeff (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11:95–103.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Decision and Information Technologies, The Robert H. Smith School of Business, University of Maryland, USA
Wolfgang Jank

Authors

Wolfgang Jank
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Maryland, USA
Bruce Golden & S. Raghavan &
American University, USA
Edward Wasil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jank, W. (2005). Fast and Efficient Model-Based Clustering with the Ascent-EM Algorithm. In: Golden, B., Raghavan, S., Wasil, E. (eds) The Next Wave in Computing, Optimization, and Decision Technologies. Operations Research/Computer Science Interfaces Series, vol 29. Springer, Boston, MA . https://doi.org/10.1007/0-387-23529-9_14

Download citation

DOI: https://doi.org/10.1007/0-387-23529-9_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-23528-8
Online ISBN: 978-0-387-23529-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast and Efficient Model-Based Clustering with the Ascent-EM Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Optimal Bayesian estimators for latent variable cluster models

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast and Efficient Model-Based Clustering with the Ascent-EM Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Optimal Bayesian estimators for latent variable cluster models

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation