A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (\(\alpha \)), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.
This is a preview of subscription content, log in to check access.
The authors would like to express their sincere thanks to the editor and the anonymous reviewers for their valuable and insightful comments. This work is supported by the National Natural Science Foundation of China (NSFC) (No. 61170110) and Zhejiang Provincial Natural Science Foundation of China (No. LY13F020043).
Compliance with ethical standards
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This article does not contain any studies with human participants or animals performed by any of the authors.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM sigmod international conference on management of data, vol 28, pp 49–60Google Scholar
Barbieri F, Mazzoni A, Logothetis NK, Panzeri S, Brunel N (2014) Stimulus dependence of local field potential spectra: experiment versus theory. J Neurosci 34(44):14589–14605CrossRefGoogle Scholar
Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37(1):54–115CrossRefzbMATHGoogle Scholar
Carpenter GA, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3(2):129–152CrossRefGoogle Scholar
Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330CrossRefGoogle Scholar
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRefGoogle Scholar
Ding SF, Du MJ, Sun TF, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313CrossRefGoogle Scholar
Ding SF, Jia HJ, Du MJ, Xue Y (2018) A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci 429:215–228MathSciNetCrossRefGoogle Scholar
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRefGoogle Scholar
Dutta M, Mahanta AK, Pujari AK (2005) QROCK: a quick version of the ROCK algorithm for clustering of categorical data. Pattern Recogn Lett 26(15):2364–2373CrossRefGoogle Scholar
Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, vol 96, pp 226–231Google Scholar
Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn 58:39–48CrossRefGoogle Scholar
Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput 14(2):123–128CrossRefGoogle Scholar
Liew AW, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075CrossRefGoogle Scholar
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297Google Scholar
Madan S, Dana KJ (2015) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19:1–18Google Scholar
Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRefGoogle Scholar
Omran MGH, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605CrossRefGoogle Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRefGoogle Scholar
Rasmussen CE (2000) The infinite gaussian mixture model. Adv Neural Inf Process Syst 12:554–560Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRefGoogle Scholar
Tomasev N, Radovanovic M, Mladenic D, Lvanovic M (2014) The role of hubness in clustering high-dimensional data. IEEE Trans Knowl Data Eng 26(3):739–751CrossRefGoogle Scholar
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86CrossRefzbMATHGoogle Scholar
Zang WK, Ren LY, Zhang WQ, Liu XY (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8):1750023MathSciNetCrossRefGoogle Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Acm Sigmod Record 25(2):103–114CrossRefGoogle Scholar
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182CrossRefGoogle Scholar