Abstract
In this paper we propose an efficient and fast EM algorithm for model-based clustering of large databases. Drawing ideas from its stochastic descendant, the Monte Carlo EM algorithm, the method uses only a sub-sample of the entire database per iteration. Starting with smaller samples in the earlier iterations for computational efficiency, the algorithm increase the sample size intelligently towards the end of the algorithm to assure maximum accuracy of the results. The intelligent sample size updating rule is centered around EM’s highly-appraised likelihood-ascent property and only increases the sample when no further improvements are possible based on the current sample. In several simulation studies we show the superiority of Ascent-EM over regular EM implementations. We apply the method to an example of clustering online auctions.
This research was partially funded by the NSF grant DMI-0205489
Chapter PDF
Similar content being viewed by others
References
Booth, James G. and Hobert, James P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society B,61:265–285.
Booth, James G, Hobert, James P, and Jank, Wolfgang (2001). A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling, 1:333–349.
Boyles, Russell A. (1983). On the convergence of the EM algorithm. Journal of the Royal Statistical Society B, 45:47–50.
Caffo, Brian S, Jank, Wolfgang S, and Jones, Galin L (2004). Ascent-Based Monte Carlo EM. Journal of the Royal Statistical Society, Series B, Forthcoming.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–22.
Levine, RA and Casella, G (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10:422–439.
Levine, RA and Fan, J (2003). An automated (Markov Chain) Monte Carlo EM algorithm. Journal of Statistical Computation and Simulation (forthcoming).
Meng, Xiao-Li (1994). On the rate of convergence of the ECM algorithm. The Annals of Statistics, 22:326–339.
Meng, Xiao-Li and Rubin, Donald B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80:267–278.
Ng, Shu-Kay and McLachlan, Geoffrey J (2003). On some variants of the EM Algorithm for fitting finite mixture models. Australian Journal of Statistics, 32:143–161.
Rubin, Donald B. (1991). EM and beyond. Psychometrika, 56:241–254.
Wei, Greg C. G. and Tanner, Martin A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85:699–704.
Wu, C. F. Jeff (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11:95–103.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this paper
Cite this paper
Jank, W. (2005). Fast and Efficient Model-Based Clustering with the Ascent-EM Algorithm. In: Golden, B., Raghavan, S., Wasil, E. (eds) The Next Wave in Computing, Optimization, and Decision Technologies. Operations Research/Computer Science Interfaces Series, vol 29. Springer, Boston, MA . https://doi.org/10.1007/0-387-23529-9_14
Download citation
DOI: https://doi.org/10.1007/0-387-23529-9_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-23528-8
Online ISBN: 978-0-387-23529-5
eBook Packages: Computer ScienceComputer Science (R0)