Non-parametric Mixture Models for Clustering

Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil

doi:10.1007/978-3-642-14980-1_32

Pavan Kumar Mallapragada²¹,
Rong Jin²¹ &
Anil Jain²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6218))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

2243 Accesses
10 Citations

Abstract

Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.

The research was partially supported by ONR grant no. N000140710225, NSF grant no. IIS-0643494. Part of Anil Jain’s research was supported by WCU(World Class University) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology(R31-2008-000-10008-0).

Download to read the full chapter text

Chapter PDF

Model-Based Clustering

Article Open access 01 October 2016

A mixture model approach to spectral clustering and application to textual data

Article 20 April 2022

Combinatorial Optimization Approaches for Data Clustering

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31, 651–666 (2010)
Article Google Scholar
McLachlan, G.L., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
Book MATH Google Scholar
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. TPAMI 24, 381–396 (2002)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers 22 (1973)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. KDD, pp. 226–231 (1996)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619 (2002)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Andreetto, M., Zelnik Manor, L., Perona, P.: Non-parametric probabilistic image segmentation. In: Proceedings of the ICCV, pp. 1–8 (2007)
Google Scholar
Shawe-Taylor, J., Dolia, A.N.: A framework for probability density estimation. In: Proc. AISTATS, pp. 468–475 (2007)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Chichester (2001)
MATH Google Scholar
Wand, M.P., Jones, M.C.: Kernel Smoothing (Monographs on Statistics and Applied Probability, December 1994. Chapman & Hall/CRC, Boca Raton (1994)
Google Scholar
Csiszar, I., Tusnady, G.: Information geometry and alternating minimization procedures. Statistics and Decision (1984)
Google Scholar
Jaakkola, T.S.: Tutorial on variational approximation methods. In: Advanced Mean Field Methods: Theory and Practice, pp. 129–159 (2000)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Article Google Scholar
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Advances in NIPS (2000)
Google Scholar
Nadler, B., Galun, M.: Fundamental limitations of spectral clustering. In: NIPS 19, Citeseer, pp. 1017–1025 (2007)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in NIPS, pp. 849–856. MIT Press, Cambridge (2001)
Google Scholar
Banerjee, A., Langford, J.: An objective evaluation criterion for clustering. In: Proceedings of the KDD, pp. 515–520 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824
Pavan Kumar Mallapragada, Rong Jin & Anil Jain

Authors

Pavan Kumar Mallapragada
View author publications
You can also search for this author in PubMed Google Scholar
Rong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Anil Jain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Vision and Pattern Recognition Group,Computer Science, University of York Heslington, YO10-5DD, York, United Kingdom
Edwin R. Hancock
Department of Computer Science, University of York, YO10 5DD, UK
Richard C. Wilson
Centre for Vision, Speech and Signal Proc (CVSSP), University of Surrey, Guildford, GU2 7XH, Surrey, United Kingdom
Terry Windeatt
Electrical and Electronics Engineering Department, Middle East Technical University, 06531, Ankara, Turkey
Ilkay Ulusoy
Department of Computer Science and Artificial Intelligence, University of Alicante, P.O.B. 99, E-03080, Alicante, Spain
Francisco Escolano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mallapragada, P.K., Jin, R., Jain, A. (2010). Non-parametric Mixture Models for Clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2010. Lecture Notes in Computer Science, vol 6218. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14980-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-14980-1_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14979-5
Online ISBN: 978-3-642-14980-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Non-parametric Mixture Models for Clustering

Abstract

Chapter PDF