A Mixture Model Approach for Binned Data Clustering

Samé, Allou; Ambroise, Christophe; Govaert, Gérard

doi:10.1007/978-3-540-45231-7_25

A Mixture Model Approach for Binned Data Clustering

Allou Samé⁹,
Christophe Ambroise⁹ &
Gérard Govaert⁹

Conference paper

1692 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2810))

Abstract

In some particular data analysis problems, available data takes the form of an histogram. Such data are also called binned data. This paper addresses the problem of clustering binned data using mixture models. A specific EM algorithm has been proposed by Cadez et al.([2]) to deal with these data. This algorithm has the disadvantage of being computationally expensive. In this paper, a classification version of this algorithm is proposed, which is much faster. The two approaches are compared using simulated data. The simulation results show that both algorithms generate comparable solutions in terms of resulting partition if the histogram is accurate enough.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McLachlan, G.J., Jones, P.N.: Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics 44(2), 571–578 (1988)
Article MATH Google Scholar
Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Machine Learning 47, 7–34 (2001)
Article Google Scholar
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Computation Statistics and Data analysis 14, 315–332 (1992)
Article MathSciNet Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognition 28(5), 781–793 (1995)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Pattern Recognition 28(5); J. Royal Stat. Soc. B 39(1), 1–38 (1977)
Google Scholar
Symons, M.J.: Clustering criteria and multivariate normal mixtures. Biometrics 37, 35–43 (1981)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Département Génie Informatique, HEUDIASYC, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20529, 60205 Cedex, Compiègne
Allou Samé, Christophe Ambroise & Gérard Govaert

Authors

Allou Samé
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Ambroise
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Govaert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Berkeley Initiative in Soft Computing (BISC), University of California at Berkeley, USA
Michael R. Berthold
Freie Universität Berlin, Garystr. 21, 14195, Berlin, Germany
Hans-Joachim Lenz
Department of Computer Science, University of Colorado, Boulder, Colorado, USA
Elizabeth Bradley
Otto-von-Guericke-University of Magdeburg, Germany
Rudolf Kruse
Department of Knowledge Processing and Language Engineering, University of Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Christian Borgelt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Samé, A., Ambroise, C., Govaert, G. (2003). A Mixture Model Approach for Binned Data Clustering. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-45231-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40813-0
Online ISBN: 978-3-540-45231-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics