Multi-feature weighting neighborhood density clustering

Abstract

Clustering is an important data mining method to discover knowledge and patterns. Feature weighting is widely applied in high-dimensional data mining. In this paper, a multi-feature weighting neighborhood density clustering algorithm is proposed. It uses different dimension reduction algorithms to generate different features, and then, the weights of the features are determined by the discrimination ability. For the clustering algorithm, the center points can be selected by the upper approximation set and lower approximation set. At last, the final clustering result is from the fusion of multiple clustering results. The proposed algorithms and comparison algorithms are executed on the synthetic and real-world data sets. The test results show that the proposed algorithm outperforms the comparison algorithms on the most experimental data sets. The experimental results prove that the proposed algorithm is effective for data clustering.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    An S, Hu Q, Yu D (2015) Robust rough set and applications. Tsinghua University Press, Beijing

    Google Scholar 

  2. 2.

    Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

  3. 3.

    Bai L, Liang J, Dang C, Cao F (2013) The impact of cluster representatives on the convergence of the k-modes type clustering. IEEE Trans Pattern Anal Mach Intell 35(6):1509–1522

    Article  Google Scholar 

  4. 4.

    Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems. MIT Press, Cambridge, pp 585–591

    Google Scholar 

  5. 5.

    Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71

    Google Scholar 

  6. 6.

    Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York

    Google Scholar 

  7. 7.

    Boongoen T, Shang C, Iam-On N, Shen Q (2011) Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans Syst Man Cybern Part B (Cybernetics) 41(6):1705–1714

    Article  Google Scholar 

  8. 8.

    Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210

    Article  Google Scholar 

  9. 9.

    Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn 45(1):434–446

    MATH  Article  Google Scholar 

  10. 10.

    Chitsaz E, Jahromi MZ (2016) A novel soft subspace clustering algorithm with noise detection for high dimensional datasets. Soft Comput 20(11):4463–4472

    MATH  Article  Google Scholar 

  11. 11.

    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  12. 12.

    Cunningham P (2008) Dimension reduction. In: Cord M, Cunningham P (eds) Machine learning techniques for multimedia. Springer, Berlin, pp 91–112

    Google Scholar 

  13. 13.

    Deng Z, Choi KS, Chung FL, Wang S (2010) Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn 43(3):767–781

    MATH  Article  Google Scholar 

  14. 14.

    Deng Z, Choi KS, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106

    MathSciNet  MATH  Article  Google Scholar 

  15. 15.

    Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn Lett 32(14):1701–1705

    Article  Google Scholar 

  16. 16.

    Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279

    Article  Google Scholar 

  17. 17.

    Fang C, Gao J, Wang D, Wang D, Wang J (2018) Optimization of stepwise clustering algorithm in backward trajectory analysis. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3782-9

    Article  Google Scholar 

  18. 18.

    García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2–3):89–109

    MathSciNet  MATH  Article  Google Scholar 

  19. 19.

    Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer. Math. 14(5):403–420

    MathSciNet  MATH  Article  Google Scholar 

  20. 20.

    Guo G, Chen S, Chen L (2012) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int. J. Mach. Learn. Cybern. 3(1):39–49

    Article  Google Scholar 

  21. 21.

    He W, Chen JX, Zhang W (2017) Low-rank representation with graph regularization for subspace clustering. Soft Comput 21(6):1569–1581

    Article  Google Scholar 

  22. 22.

    Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    MathSciNet  MATH  Article  Google Scholar 

  23. 23.

    Huang X, Ye Y, Guo H, Cai Y, Zhang H, Li Y (2014) Dskmeans: a new kmeans-type approach to discriminative subspace clustering. Knowl Based Syst 70:293–300

    Article  Google Scholar 

  24. 24.

    Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446

    Article  Google Scholar 

  25. 25.

    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv: CSUR 31(3):264–323

    Article  Google Scholar 

  26. 26.

    Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041

    Article  Google Scholar 

  27. 27.

    Jolliffe I (2011) Principal component analysis. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 1094–1096

    Google Scholar 

  28. 28.

    Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 7:881–892

    MATH  Article  Google Scholar 

  29. 29.

    Kim Kj, Ahn H (2008) A recommender system using ga k-means clustering in an online shopping market. Expert Syst Appl 34(2):1200–1209

    Article  Google Scholar 

  30. 30.

    Kumar A, Daumé H (2011) A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 393–400

  31. 31.

    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788

    MATH  Article  Google Scholar 

  32. 32.

    Logesh R, Subramaniyaswamy V, Malathi D, Sivaramakrishnan N, Vijayakumar V (2018) Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3891-5

    Article  Google Scholar 

  33. 33.

    MacQueen J et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, CA, USA, pp 281–297

  34. 34.

    Nataliani Y, Yang MS (2017) Powered Gaussian kernel spectral clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3036-2

    Article  Google Scholar 

  35. 35.

    Qian Y, Liang J, Wu W, Dang C (2011) Information granularity in fuzzy binary GRC model. IEEE Trans Fuzzy Syst 19(2):253–264

    Article  Google Scholar 

  36. 36.

    Ren Y, Domeniconi C, Zhang G, Yu G (2014) A weighted adaptive mean shift clustering algorithm. In: Proceedings of the 2014 SIAM international conference on data mining. SIAM, pp 794–802

  37. 37.

    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  38. 38.

    Saha S, Das R (2018) Exploring differential evolution and particle swarm optimization to develop some symmetry-based automatic clustering techniques: application to gene clustering. Neural Comput Appl 30(3):735–757. https://doi.org/10.1007/s00521-016-2710-0

    Article  Google Scholar 

  39. 39.

    Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  40. 40.

    Tassa T, Cohen DJ (2013) Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans Knowl Data Eng 25(2):311–324

    Article  Google Scholar 

  41. 41.

    Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  42. 42.

    Wang J, Fl Chung, Wang S, Deng Z (2014) Double indices-induced fcm clustering and its integration with fuzzy subspace clustering. Pattern Anal Appl 17(3):549–566

    MathSciNet  MATH  Article  Google Scholar 

  43. 43.

    Wang Y, Ru Y, Chai J (2018) Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3731-7

    Article  Google Scholar 

  44. 44.

    Wu WZ, Leung Y, Mi JS (2009) Granular computing and knowledge reduction in formal contexts. IEEE Trans Knowl Data Eng 21(10):1461–1474

    Article  Google Scholar 

  45. 45.

    Xia H, Zhuang J, Yu D (2013) Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data. Pattern Recogn 46(9):2562–2575

    MATH  Article  Google Scholar 

  46. 46.

    Zhang H, Chow TW, Wu QJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550

    Article  Google Scholar 

  47. 47.

    Zhang H, Wu QJ, Chow TW, Zhao M (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recogn 45(5):1866–1876

    MATH  Article  Google Scholar 

  48. 48.

    Zhang X (2017) Data clustering. Science Press, Beijing

    Google Scholar 

  49. 49.

    Zhang X, Jing L, Hu X, Ng M, Jiangxi JX, Zhou X (2008) Medical document clustering using ontology-based term similarity measures. Int J Data Warehous Min: IJDWM 4(1):62–73

    Article  Google Scholar 

  50. 50.

    Zhao M, Zhang H, Cheng W, Zhang Z (2016) Joint l p-and l 2, p-norm minimization for subspace clustering with outlier pursuit. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 3658–3665

  51. 51.

    Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, London

    Google Scholar 

  52. 52.

    Zong L, Zhang X, Zhao L, Yu H, Zhao Q (2017) Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Netw 88:74–89

    MATH  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Nos.2017YFB1300200, 2017YFB1300203), National Natural Science Fund of China (Nos. 61672130, 61602082, 61627808, 91648205), the Open Program of State Key Laboratory of Software Architecture (No. SKLSAOP1701), MOE Research Center for Online Education of China (No. 2016YB121), LiaoNing Revitalization Talents Program (No. XLYC1806006), the Fundamental Research Funds for the Central Universities (Nos. DUT19RC(3)012, DUT17RC(3)071) and the development of science and technology of Guangdong province special fund project (No. 2016B090910001). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lin Feng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, S., Feng, L., Liu, S. et al. Multi-feature weighting neighborhood density clustering. Neural Comput & Applic 32, 9545–9565 (2020). https://doi.org/10.1007/s00521-019-04467-4

Download citation

Keywords

  • Clustering analysis
  • Multi-feature
  • Neighborhood density
  • Rough set
  • Granular computing