Automatically finding the number of clusters based on simulated annealing

Yang, Zhengwu; Huo, Hong; Fang, Tao

doi:10.1007/s12204-017-1813-9

Automatically finding the number of clusters based on simulated annealing

Published: 31 March 2017

Volume 22, pages 139–147, (2017)
Cite this article

Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Zhengwu Yang (杨政武)¹,
Hong Huo (霍宏)¹ &
Tao Fang (方涛)¹

70 Accesses
Explore all metrics

Abstract

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

A survey on instance segmentation: state of the art

Article 03 July 2020

References

XU R. Survey of clustering algorithms [J]. IEEE Transaction on Neural Networks, 2005, 16(3): 645–678.
Article Google Scholar
WANG L, LECKIE C, RAMAMOHANARAO K, et al. Automatically determining the number of clusters in unlabeled data sets [J]. IEEE Transaction on Knowledge and Data Engineering, 2009, 21(3): 335–350.
Article Google Scholar
CHEN C, PAU L, WANG P. Handbook of pattern recognition and computer vision [M]. Singapore: World Scientific, 1993.
Book MATH Google Scholar
CALIńSKI R, HARABASZ J. A denrite method for cluster analysis [J]. Communications in Statistics, 1974, 3(1): 1–27.
MATH Google Scholar
HARTIGAN J A. Clustering algorithms [M]. Toronto: Wiley, 1975.
MATH Google Scholar
KRZANOWSKI W J, LAI Y T. A criterion for determining the number of clusters in a dataset [J]. Biometrics, 1985, 44(1): 23–34.
Article Google Scholar
SUGAR C A, JAMES G M. Finding the number of clusters in a dataset: An information theoretic approach [J]. Journal of American Statistical Association, 2003, 98: 750–763.
Article MathSciNet MATH Google Scholar
ROUSSEEUW P J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis [J]. Journal of Computational and Applied Mathematics, 1987, 20: 53–65.
Article MATH Google Scholar
TIBSHIRANI R, WALTHER G, HASTIE T. Estimating the number of clusters in a dataset via the gap statistic [J]. Journal of the Royal Statistical Society, Series B, 2001, 63: 411–423.
Article MathSciNet MATH Google Scholar
PERMUTER H, FRANCOS J, JERMYN I H. Gaussian mixture models of texture and colour for image database retrieval [C]//Proceedings of ICASSP. Hong Kong, China: IEEE, 2003: 569–572.
Google Scholar
VERMA B, RAHMAN A. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning [J]. IEEE Transaction on Knowledge and Data Engineering, 2012, 24(4): 605–618.
Article Google Scholar
WANG J H. Consistent selection of the number of clusters via cross-validation [J]. Biometrika, 2010, 97(4): 893–904.
Article MathSciNet MATH Google Scholar
EVERITT B, LANDAU S, LEESE M. Cluster analysis [M]. London: Arnold, 2001.
MATH Google Scholar
KIRKPATRICK S, GELATT C D, VECCHI J M P. Optimization by simulated annealing [J]. Science, 1983, 220(4598): 671–681.
Article MathSciNet MATH Google Scholar
BERTSIMAS D, TSITSIKLIS J. Simulated annealing [J]. Statistical Science, 1993, 8(1): 10–15.
Article MATH Google Scholar
CHIB S, GREENBERG E. Understanding the Metropolis-Hastings algorithm [J]. American Statistician, 1995, 49(4): 327–335.
Google Scholar
FAIGLE U, KERN W. Note on the convergence of simulated annealing algorithms [J]. SIAM Journal of Control and Optimization, 1991, 29(1): 153–159.
Article MathSciNet MATH Google Scholar
ARTHUR D, VASSILVITSKII S. k-means++: The advantage of careful seeding [C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, Louisiana: ACM, 2007: 1027–1035.
Google Scholar
MCALLESTER D, SELMAN B, KAUTZ H. Evidence for invariants in local search [C]//Proceedings of the 14th National Conference on Artificial Intelligence. Menlo Park, USA: AAAI Press, 1997: 321–326.
Google Scholar
YANG Z W, FANG T. On the accuracy of image normalization by Zernike moments [J]. Image and Vision Computing, 2010, 28: 403–413.
Article Google Scholar
LICHMAN M. UCI machine learning database [DB/OL]. (2010-02-02). http://archive.ics.uci.edu/ml/.
Google Scholar
BREITENBACH M, GRUDIC G E. Clustering through ranking on manifolds [C]//Proceedings of 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005: 73–80.
Google Scholar
MANJUNATH B S, MA W Y. Texture features for browsing and retrieval of image data [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1996, 18(8): 837–842.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China
Zhengwu Yang (杨政武), Hong Huo (霍宏) & Tao Fang (方涛)

Authors

Zhengwu Yang (杨政武)
View author publications
You can also search for this author in PubMed Google Scholar
Hong Huo (霍宏)
View author publications
You can also search for this author in PubMed Google Scholar
Tao Fang (方涛)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Fang (方涛).

Additional information

Foundation item: the National Basic Research Program (973) of China (No. 2012CB719903), and the National Natural Science Foundation of China (No. 41071256)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z., Huo, H. & Fang, T. Automatically finding the number of clusters based on simulated annealing. J. Shanghai Jiaotong Univ. (Sci.) 22, 139–147 (2017). https://doi.org/10.1007/s12204-017-1813-9

Download citation

Received: 01 March 2016
Published: 31 March 2017
Issue Date: April 2017
DOI: https://doi.org/10.1007/s12204-017-1813-9

Key words

CLC number

TP 301.6

Document code

A

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatically finding the number of clusters based on simulated annealing

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

A survey on instance segmentation: state of the art

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Document code

Navigation

Automatically finding the number of clusters based on simulated annealing

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

A survey on instance segmentation: state of the art

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Document code

Search

Navigation