Advertisement

Journal of Signal Processing Systems

, Volume 91, Issue 10, pp 1219–1236 | Cite as

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

  • Lizhong XuEmail author
  • Jia Zhao
  • Zhanfeng Yao
  • Aiye Shi
  • Zhe Chen
Article
  • 97 Downloads

Abstract

Rodriguez et al. published an algorithm called clustering by fast search and find of density peaks (DPC) in Science in June 2014. It can quickly search the density peaks and cluster the datasets efficiently. However, there are some drawbacks. First, the local density definition is simple for the datasets with both dense clusters and sparse clusters; the density peaks cannot be found correctly using the two local density definition methods. Second, there is poor assignment fault tolerance, if a point is misallocated, the subsequent assignment will further amplify the error, which will have a serious impact on the clustering results. To solve the problems, a new clustering method, density peak clustering based on cumulative nearest neighbors degree and micro cluster merging, is proposed. The proposed method improves the DPC algorithm in two ways, the one is that the method defines a new local density to solve the defect of the DPC algorithm; the other one is that the graph degree linkage is combined with the DPC to alleviate the problem of distribution errors. The experiments on synthetic and real-world datasets show that the proposed method outperforms DPC, DBSCAN, OPTICS, AP, K-Means and other DPC variant algorithms.

Keywords

Density peak clustering K-nearest neighbors Local density Micro cluster Allocation strategy 

Notes

Acknowledgments

This research was supported by the National Natural Science Foundation of China under Grant (Nos. 71433003, 51669014), the Science Fund for Distinguished Young Scholars of Jiangxi Province under Grant (No. 2018ACB21029).

References

  1. 1.
    Cui, Z., Zhang, J., Wang, Y., Cao, Y., Cai, X., Zhang, W., & Chen, J. (2019). A pigeon-inspired optimization algorithm for many-objective optimization problems[J]. SCIENCE CHINA Information Sciences, 62(7), 070212.  https://doi.org/10.1007/s11432-018-9729-5.CrossRefGoogle Scholar
  2. 2.
    Zhang, M., Wang, H., Cui, Z., et al. (2018). Hybrid multi-objective cuckoo search with dynamical local search[J]. Memetic Computing, 10(2), 199–208.CrossRefGoogle Scholar
  3. 3.
    Zhihua Cui, Bin Sun, Gaige Wang, et al. A novel oriented cuckoo search algorithm to improve DV-hop performance for cyber-physical systems[J]. Journal of Parallel and Distributed Computing, 2017, 103:42–52.CrossRefGoogle Scholar
  4. 4.
    Cai, X., Gao, X.-z., & Xue, Y. (2016). Improved bat algorithm with optimal forage strategy and random disturbance strategy[J]. International Journal of Bio-inspired Computation, 8(4), 205–214.CrossRefGoogle Scholar
  5. 5.
    Gai, K., & Qiu, M. (2018). Blend arithmetic operations on tensor-based fully homomorphic encryption over real numbers[J]. IEEE Transactions on Industrial Informatics, 14(8), 3590–3598.CrossRefGoogle Scholar
  6. 6.
    Gai, K., Choo, K. K. R., Qiu, M., & Zhu, L. (2018). Privacy-preserving content-oriented wireless communication in internet-of-things[J]. IEEE Internet of Things Journal, 5(4), 3059–3067.CrossRefGoogle Scholar
  7. 7.
    Wang, G., Cai, X., Cui, Z., et al. (2017). High performance computing for cyber physical social systems by using evolutionary multi-objective optimization algorithm[J]. IEEE Transactions on Emerging Topics in Computing.  https://doi.org/10.1109/TETC.2017.2703784.
  8. 8.
    Qiu, M., Gai, K., & Xiong, Z. (2018). Privacy-preserving wireless communications using bipartite matching in social big data[J]. Future Generation Computer Systems, 87, 772–781.CrossRefGoogle Scholar
  9. 9.
    Wu, P., Lu, Z., Zhou, Q., Lei, Z., Li, X., Qiu, M., & Hung, P. C. K. (2019). Big data logs analysis based on seq2seq networks for cognitive internet of things[J]. Future Generation Computer Systems, 90, 477–488.CrossRefGoogle Scholar
  10. 10.
    Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-g., & Chen, J. (2018). Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 14(7), 3187–3196.CrossRefGoogle Scholar
  11. 11.
    Cui, Z., Cao, Y., Cai, X., Cai, J., & Chen, J. (2017). Optimal LEACH protocol with modified bat algorithm for big data sensing systems in internet of things. Journal of Parallel and Distributed Computing.  https://doi.org/10.1016/j.jpdc.2017.12.014.CrossRefGoogle Scholar
  12. 12.
    Macqueen, J. (1967). Some methods for classification and analysis of multi variate observations[C]. Proceedings of Berkeley symposium on mathematical statistics &probability, 281–297.Google Scholar
  13. 13.
    Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications [J]. Data Mining and Knowledge Discovery, 1(2), 141–182.CrossRefGoogle Scholar
  14. 14.
    Ester M. A. (1996). Density-based algorithm for discovering clusters in large spatial databases with noise[C]. Proceedings of the second ACM international conference on knowledge discovery and data mining, 226–231.Google Scholar
  15. 15.
    Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points[J]. Science, 315(5814), 972–976.MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Wei, W., Yang, J., & Muntz, R. R. (1997). STING: A statistical information grid approach to spatial data mining[C]. Proceedings of the 23rd international conference on very large data bases, 186–195.Google Scholar
  17. 17.
    Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks[J]. Science, 344(6191), 1492–1496.CrossRefGoogle Scholar
  18. 18.
    Xu, M., Li, Y., Li, R., Zou, F., & Gu, X. (2019). EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks[J]. Neurocomputing, 337, 287–302.CrossRefGoogle Scholar
  19. 19.
    Mehmood, R., El-Ashram, S., Bie, R., et al. (2018). Effective cancer subtyping by employing density peaks clustering by using gene expression microarray[J]. Personal and Ubiquitous Computing, 22(3), 615–619.CrossRefGoogle Scholar
  20. 20.
    Liao, E., & Liu, C. (2018). A hierarchical algorithm based on density peaks clustering and ant Colony optimization for traveling salesman problem[J]. IEEE Access, 6, 38921–38933.CrossRefGoogle Scholar
  21. 21.
    Zhang, W., Wang, X., & Zhao, D., et al. (2012). Graph degree linkage: Agglomerative clustering on a directed graph[C]. Proceedings of the European conference on computer vision, 428–441.Google Scholar
  22. 22.
    Zhou, Z., Gangquan, S., Yanbin, Z., et al. (2018). Robust clustering by identifying the veins of clusters based on kernel density estimation[J]. Knowledge-Based Systems, 159, 309–320.CrossRefGoogle Scholar
  23. 23.
    Xue, X., Gan, S., Peng, H., et al. (2018). Improved density peaks clustering algorithm combining K-nearest neighbors[J]. Computer Engineering and Applications, 54(7), 36–43.Google Scholar
  24. 24.
    Du, M., Ding, S., & Xue, Y. (2017). A robust density peaks clustering algorithm using fuzzy neighborhood[J]. International Journal of Machine Learning & Cybernetics, 12, 1–10.Google Scholar
  25. 25.
    Zang, W., Ren, L., Zhang, W., & Liu, X. (2017). Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process[J]. International Journal of Pattern Recognition and Artificial Intelligence, 31(8), 1750023.MathSciNetCrossRefGoogle Scholar
  26. 26.
    Qiu, B., & Cheng, L. (2018). A parameter-free clustering algorithm based on Laplace centrality and density peaks. Journal of Computer Applications, 38(9), 2511–2514.Google Scholar
  27. 27.
    Yaohui, L., Zhengming, M., & Fang, Y. (2017). Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy [J]. Knowledge-Based Systems, 133, 208–220.CrossRefGoogle Scholar
  28. 28.
    Xie, J., Gao, H., Xie, W., Liu, X., & Grant, P. W. (2016). Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 354, 19–40.CrossRefGoogle Scholar
  29. 29.
    Ankerst, M., Breunig, M. M., & Kriegel, H.-P., et al. (1999). Optics: Ordering points to identify the clustering structure[C]. Proceedings of the ACM Sigmod Record, 49–60.Google Scholar
  30. 30.
    Vinh, N., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance [J]. Journal of Machine Learning Research, 11(1), 2837–2854.MathSciNetzbMATHGoogle Scholar
  31. 31.
    Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical Clusterings [J]. Journal of the American Statistical Association, 78(383), 553–569.zbMATHCrossRefGoogle Scholar
  32. 32.
    Liu, R., Wang, H., & Yu, X. (2018). Shared-nearest-neighbor-based clustering by fast search and find of density peaks [J]. Information Sciences, 450, 200–226.MathSciNetCrossRefGoogle Scholar
  33. 33.
    Jain, A.K., & Law, M.H. (2005). Data clustering: A user’s dilemma[C]. Proceedings of the international conference on pattern recognition and machine intelligence, : 1–10.Google Scholar
  34. 34.
    Chang, H., & Yeung, D.-Y. (2008). Robust path-based spectral clustering[J]. Pattern Recognition, 41(1), 191–203.zbMATHCrossRefGoogle Scholar
  35. 35.
    Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation[J]. ACM Transactions on Knowledge Discovery from Data, 1(1), 1–30.CrossRefGoogle Scholar
  36. 36.
    Fu, L., & Medico, E. (2007). FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 8(1), 3.CrossRefGoogle Scholar
  37. 37.
    Veenman, C. J., Reinders, M. J. T., & Backer, E. (2002). A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1273–1280.CrossRefGoogle Scholar
  38. 38.
    Franti, P., Virmajoki, O., & Hautamaki, V. (2006). Fast agglomerative clustering using a k-nearest neighbor graph[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1875–1881.CrossRefGoogle Scholar
  39. 39.
    Frnti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems[J]. Pattern Recognition, 39(5), 761–775.zbMATHCrossRefGoogle Scholar
  40. 40.
    Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Irvine: University of California.
  41. 41.
    Street, W. N., Wolberg, W. H., & Mangasarian, O. L. (1993). Nuclear feature extraction for breast tumor diagnosis[C]. Proceedings of the IS&T/SPIE International Symposium on Electronic Imaging:Science and Technology, 1905, 861–870.Google Scholar
  42. 42.
    Charytanowicz, M., Niewczas, J., Kulczycki, P., et al. (2010). Complete gradient clustering algorithm for features analysis of x-ray images [J]. Advances in Intelligent and Soft Computing, 69, 15-24.Google Scholar
  43. 43.
    Dias, D. B., Madeo, R. C. B., & Rocha T., et al. (2009). Hand movement recognition for brazilian sign language: A study using distance-based neural networks[C]. Proceedings of the international joint on neural networks, 697–704.Google Scholar
  44. 44.
    Sigillito, V. G., Wing, S. P., Hutton, L. V., et al. (1989). Classification of radar returns from the ionosphere using neural networks[J]. Johns Hopkins APL Technical Digest, 10(3), 262–266.Google Scholar
  45. 45.
    Breiman, L., Friedman, J., Stone, C. J., et al. (1984). Classification and regression trees[M]. Boca Raton: CRC Press.zbMATHGoogle Scholar
  46. 46.
    Ding, J., He, X., Yuan, J., & Jiang, B. (2018). Automatic clustering based on density peak detection using generalized extreme value distribution[J]. Soft Computing, 22, 2777–2796.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer and Information EngineeringHohai UniversityNanjingChina
  2. 2.School of Information EngineeringNanchang Institute of TechnologyNanchangChina

Personalised recommendations