Skip to main content
Log in

Automatically finding the number of clusters based on simulated annealing

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. XU R. Survey of clustering algorithms [J]. IEEE Transaction on Neural Networks, 2005, 16(3): 645–678.

    Article  Google Scholar 

  2. WANG L, LECKIE C, RAMAMOHANARAO K, et al. Automatically determining the number of clusters in unlabeled data sets [J]. IEEE Transaction on Knowledge and Data Engineering, 2009, 21(3): 335–350.

    Article  Google Scholar 

  3. CHEN C, PAU L, WANG P. Handbook of pattern recognition and computer vision [M]. Singapore: World Scientific, 1993.

    Book  MATH  Google Scholar 

  4. CALIńSKI R, HARABASZ J. A denrite method for cluster analysis [J]. Communications in Statistics, 1974, 3(1): 1–27.

    MATH  Google Scholar 

  5. HARTIGAN J A. Clustering algorithms [M]. Toronto: Wiley, 1975.

    MATH  Google Scholar 

  6. KRZANOWSKI W J, LAI Y T. A criterion for determining the number of clusters in a dataset [J]. Biometrics, 1985, 44(1): 23–34.

    Article  Google Scholar 

  7. SUGAR C A, JAMES G M. Finding the number of clusters in a dataset: An information theoretic approach [J]. Journal of American Statistical Association, 2003, 98: 750–763.

    Article  MathSciNet  MATH  Google Scholar 

  8. ROUSSEEUW P J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis [J]. Journal of Computational and Applied Mathematics, 1987, 20: 53–65.

    Article  MATH  Google Scholar 

  9. TIBSHIRANI R, WALTHER G, HASTIE T. Estimating the number of clusters in a dataset via the gap statistic [J]. Journal of the Royal Statistical Society, Series B, 2001, 63: 411–423.

    Article  MathSciNet  MATH  Google Scholar 

  10. PERMUTER H, FRANCOS J, JERMYN I H. Gaussian mixture models of texture and colour for image database retrieval [C]//Proceedings of ICASSP. Hong Kong, China: IEEE, 2003: 569–572.

    Google Scholar 

  11. VERMA B, RAHMAN A. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning [J]. IEEE Transaction on Knowledge and Data Engineering, 2012, 24(4): 605–618.

    Article  Google Scholar 

  12. WANG J H. Consistent selection of the number of clusters via cross-validation [J]. Biometrika, 2010, 97(4): 893–904.

    Article  MathSciNet  MATH  Google Scholar 

  13. EVERITT B, LANDAU S, LEESE M. Cluster analysis [M]. London: Arnold, 2001.

    MATH  Google Scholar 

  14. KIRKPATRICK S, GELATT C D, VECCHI J M P. Optimization by simulated annealing [J]. Science, 1983, 220(4598): 671–681.

    Article  MathSciNet  MATH  Google Scholar 

  15. BERTSIMAS D, TSITSIKLIS J. Simulated annealing [J]. Statistical Science, 1993, 8(1): 10–15.

    Article  MATH  Google Scholar 

  16. CHIB S, GREENBERG E. Understanding the Metropolis-Hastings algorithm [J]. American Statistician, 1995, 49(4): 327–335.

    Google Scholar 

  17. FAIGLE U, KERN W. Note on the convergence of simulated annealing algorithms [J]. SIAM Journal of Control and Optimization, 1991, 29(1): 153–159.

    Article  MathSciNet  MATH  Google Scholar 

  18. ARTHUR D, VASSILVITSKII S. k-means++: The advantage of careful seeding [C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, Louisiana: ACM, 2007: 1027–1035.

    Google Scholar 

  19. MCALLESTER D, SELMAN B, KAUTZ H. Evidence for invariants in local search [C]//Proceedings of the 14th National Conference on Artificial Intelligence. Menlo Park, USA: AAAI Press, 1997: 321–326.

    Google Scholar 

  20. YANG Z W, FANG T. On the accuracy of image normalization by Zernike moments [J]. Image and Vision Computing, 2010, 28: 403–413.

    Article  Google Scholar 

  21. LICHMAN M. UCI machine learning database [DB/OL]. (2010-02-02). http://archive.ics.uci.edu/ml/.

    Google Scholar 

  22. BREITENBACH M, GRUDIC G E. Clustering through ranking on manifolds [C]//Proceedings of 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005: 73–80.

    Google Scholar 

  23. MANJUNATH B S, MA W Y. Texture features for browsing and retrieval of image data [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1996, 18(8): 837–842.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Fang  (方 涛).

Additional information

Foundation item: the National Basic Research Program (973) of China (No. 2012CB719903), and the National Natural Science Foundation of China (No. 41071256)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Huo, H. & Fang, T. Automatically finding the number of clusters based on simulated annealing. J. Shanghai Jiaotong Univ. (Sci.) 22, 139–147 (2017). https://doi.org/10.1007/s12204-017-1813-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-017-1813-9

Key words

CLC number

Document code

Navigation