A Comparative Study of Several Cluster Number Selection Criteria

Hu, Xuelei; Xu, Lei

doi:10.1007/978-3-540-45080-1_27

Xuelei Hu⁷ &
Lei Xu⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1365 Accesses
18 Citations

Abstract

The selection of the number of clusters is an important and challenging issue in cluster analysis. In this paper we perform an experimental comparison of several criteria for determining the number of clusters based on Gaussian mixture model. The criteria that we consider include Akaike’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the minimum description length (MDL) criterion which formally coincides with the Bayesian inference criterion (BIC), and two model selection methods driven from Bayesian Ying-Yang (BYY) harmony learning: harmony empirical learning criterion (BYY-HEC) and harmony data smoothing criterion (BYY-HDS). We investigate these methods on synthetic data sets of different sample size and the iris data set. The results of experiments illustrate that BYY-HDS has the best overall success rate and obviously outperforms other methods for small sample size. CAIC and MDL tend to underestimate the number of clusters, while AIC and BYY-HEC tend to overestimate the number of clusters especially in the case of small sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)
Article Google Scholar
Bozdogan, H.: Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, Dordrecht, the Netherlands, vol. 2, pp. 69–113. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar
Xu, L.: Byy harmony learning, structural rpcl, and topological self-organizing on mixture models. Neural Networks 15, 1125–1151 (2002)
Article Google Scholar
Akaik, H.: A new look at statistical model identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)
Article Google Scholar
Bozdogan, H.: Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52, 345–370 (1987)
Article MATH MathSciNet Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Barron, A., Rissanen, J.: The minimum description length principle in coding and modeling. IEEE Trans. Information Theory 44, 2743–2760 (1998)
Article MATH MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)
Article MATH MathSciNet Google Scholar
Dempster, A., Laird, N.: Rubin: Maximum likelihood estimation from incomplete data via the em algorithm. J. Royal Statistical Soc. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Xu, L.: Bayesian ying-yang machine, clustering and number of clusters. Pattern Recognition Letters 18, 1167–1178 (1997)
Article Google Scholar
Xu, L.: Bayesian ying-yang system and theory as a unified statistical learning approach (i) unsupervised and semi-unsupervised learning. In: Amari, S., Kassabov, N. (eds.) Brain-like Computing and Learning, pp. 241–274. Springer, Heidelberg (1997)
Google Scholar
Xu, L.: Byy harmony learning, independent state space, and generalized apt financial analyses. IEEE Tansactions on Neural Networks 12, 822–849 (2001)
Article Google Scholar
Xu, L.: Data smoothing regularization, multi-sets-learning, and problem solving stategies. Neural Networks (2003) (in press)
Google Scholar
Sclove, S.L.: Some aspects of model-selection criteria. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, Dordrecht, the Netherlands, vol. 2, pp. 37–67. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Xuelei Hu & Lei Xu

Authors

Xuelei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, X., Xu, L. (2003). A Comparative Study of Several Cluster Number Selection Criteria. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics