Advertisement

Soft Computing

, Volume 23, Issue 14, pp 5645–5657 | Cite as

Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

  • Xian Fang
  • Zhixin TieEmail author
  • Yinan Guan
  • Shanshan Rao
Methodologies and Application
  • 145 Downloads

Abstract

A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (\(\alpha \)), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.

Keywords

Data clustering Quasi-cluster centers clustering Potential entropy Optimal parameter t-distributed stochastic neighbor embedding 

Notes

Acknowledgements

The authors would like to express their sincere thanks to the editor and the anonymous reviewers for their valuable and insightful comments. This work is supported by the National Natural Science Foundation of China (NSFC) (No. 61170110) and Zhejiang Provincial Natural Science Foundation of China (No. LY13F020043).

Compliance with ethical standards

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105Google Scholar
  2. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM sigmod international conference on management of data, vol 28, pp 49–60Google Scholar
  3. Barbieri F, Mazzoni A, Logothetis NK, Panzeri S, Brunel N (2014) Stimulus dependence of local field potential spectra: experiment versus theory. J Neurosci 34(44):14589–14605CrossRefGoogle Scholar
  4. Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37(1):54–115CrossRefzbMATHGoogle Scholar
  5. Carpenter GA, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3(2):129–152CrossRefGoogle Scholar
  6. Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330CrossRefGoogle Scholar
  7. Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203CrossRefzbMATHGoogle Scholar
  8. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRefGoogle Scholar
  9. Ding SF, Du MJ, Sun TF, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313CrossRefGoogle Scholar
  10. Ding SF, Jia HJ, Du MJ, Xue Y (2018) A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci 429:215–228MathSciNetCrossRefGoogle Scholar
  11. Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRefGoogle Scholar
  12. Dutta M, Mahanta AK, Pujari AK (2005) QROCK: a quick version of the ROCK algorithm for clustering of categorical data. Pattern Recogn Lett 26(15):2364–2373CrossRefGoogle Scholar
  13. Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, vol 96, pp 226–231Google Scholar
  14. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172Google Scholar
  15. Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82CrossRefGoogle Scholar
  16. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512–521Google Scholar
  17. Horn D, Gottlieb A (2002) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):1–4Google Scholar
  18. Huang JL, Zhu QS, Yang LJ, Cheng DD, Wu QW (2017) QCC: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRefzbMATHGoogle Scholar
  19. Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRefGoogle Scholar
  20. Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6MathSciNetCrossRefzbMATHGoogle Scholar
  21. Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn 58:39–48CrossRefGoogle Scholar
  22. Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput 14(2):123–128CrossRefGoogle Scholar
  23. Liew AW, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075CrossRefGoogle Scholar
  24. Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297Google Scholar
  25. Madan S, Dana KJ (2015) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19:1–18Google Scholar
  26. Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRefGoogle Scholar
  27. Omran MGH, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605CrossRefGoogle Scholar
  28. Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRefGoogle Scholar
  29. Rasmussen CE (2000) The infinite gaussian mixture model. Adv Neural Inf Process Syst 12:554–560Google Scholar
  30. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRefGoogle Scholar
  31. Tomasev N, Radovanovic M, Mladenic D, Lvanovic M (2014) The role of hubness in clustering high-dimensional data. IEEE Trans Knowl Data Eng 26(3):739–751CrossRefGoogle Scholar
  32. Van der Maaten LJP (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245MathSciNetzbMATHGoogle Scholar
  33. Van der Maaten LJP, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605zbMATHGoogle Scholar
  34. Wang SL, Wang DK, Li CY, Li Y, Ding GY (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402CrossRefGoogle Scholar
  35. Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: International conference on very large data bases, Inc, pp 186–195Google Scholar
  36. Wu YC (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225CrossRefGoogle Scholar
  37. Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRefGoogle Scholar
  38. Xu DK, Tian YJ (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193MathSciNetCrossRefGoogle Scholar
  39. Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86CrossRefzbMATHGoogle Scholar
  40. Zang WK, Ren LY, Zhang WQ, Liu XY (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8):1750023MathSciNetCrossRefGoogle Scholar
  41. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Acm Sigmod Record 25(2):103–114CrossRefGoogle Scholar
  42. Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Xian Fang
    • 1
  • Zhixin Tie
    • 1
    Email author
  • Yinan Guan
    • 1
  • Shanshan Rao
    • 1
  1. 1.School of Information Science and TechnologyZhejiang Sci-Tech UniversityHangzhouChina

Personalised recommendations