Abstract
In some particular data analysis problems, available data takes the form of an histogram. Such data are also called binned data. This paper addresses the problem of clustering binned data using mixture models. A specific EM algorithm has been proposed by Cadez et al.([2]) to deal with these data. This algorithm has the disadvantage of being computationally expensive. In this paper, a classification version of this algorithm is proposed, which is much faster. The two approaches are compared using simulated data. The simulation results show that both algorithms generate comparable solutions in terms of resulting partition if the histogram is accurate enough.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
McLachlan, G.J., Jones, P.N.: Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics 44(2), 571–578 (1988)
Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Machine Learning 47, 7–34 (2001)
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Computation Statistics and Data analysis 14, 315–332 (1992)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognition 28(5), 781–793 (1995)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Pattern Recognition 28(5); J. Royal Stat. Soc. B 39(1), 1–38 (1977)
Symons, M.J.: Clustering criteria and multivariate normal mixtures. Biometrics 37, 35–43 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Samé, A., Ambroise, C., Govaert, G. (2003). A Mixture Model Approach for Binned Data Clustering. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-45231-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40813-0
Online ISBN: 978-3-540-45231-7
eBook Packages: Springer Book Archive