Abstract
Clustering is an important data mining method to discover knowledge and patterns. Feature weighting is widely applied in high-dimensional data mining. In this paper, a multi-feature weighting neighborhood density clustering algorithm is proposed. It uses different dimension reduction algorithms to generate different features, and then, the weights of the features are determined by the discrimination ability. For the clustering algorithm, the center points can be selected by the upper approximation set and lower approximation set. At last, the final clustering result is from the fusion of multiple clustering results. The proposed algorithms and comparison algorithms are executed on the synthetic and real-world data sets. The test results show that the proposed algorithm outperforms the comparison algorithms on the most experimental data sets. The experimental results prove that the proposed algorithm is effective for data clustering.
Similar content being viewed by others
References
An S, Hu Q, Yu D (2015) Robust rough set and applications. Tsinghua University Press, Beijing
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Bai L, Liang J, Dang C, Cao F (2013) The impact of cluster representatives on the convergence of the k-modes type clustering. IEEE Trans Pattern Anal Mach Intell 35(6):1509–1522
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems. MIT Press, Cambridge, pp 585–591
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York
Boongoen T, Shang C, Iam-On N, Shen Q (2011) Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans Syst Man Cybern Part B (Cybernetics) 41(6):1705–1714
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn 45(1):434–446
Chitsaz E, Jahromi MZ (2016) A novel soft subspace clustering algorithm with noise detection for high dimensional datasets. Soft Comput 20(11):4463–4472
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Cunningham P (2008) Dimension reduction. In: Cord M, Cunningham P (eds) Machine learning techniques for multimedia. Springer, Berlin, pp 91–112
Deng Z, Choi KS, Chung FL, Wang S (2010) Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recogn 43(3):767–781
Deng Z, Choi KS, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106
Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn Lett 32(14):1701–1705
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279
Fang C, Gao J, Wang D, Wang D, Wang J (2018) Optimization of stepwise clustering algorithm in backward trajectory analysis. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3782-9
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2–3):89–109
Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer. Math. 14(5):403–420
Guo G, Chen S, Chen L (2012) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int. J. Mach. Learn. Cybern. 3(1):39–49
He W, Chen JX, Zhang W (2017) Low-rank representation with graph regularization for subspace clustering. Soft Comput 21(6):1569–1581
Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Huang X, Ye Y, Guo H, Cai Y, Zhang H, Li Y (2014) Dskmeans: a new kmeans-type approach to discriminative subspace clustering. Knowl Based Syst 70:293–300
Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv: CSUR 31(3):264–323
Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041
Jolliffe I (2011) Principal component analysis. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 1094–1096
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 7:881–892
Kim Kj, Ahn H (2008) A recommender system using ga k-means clustering in an online shopping market. Expert Syst Appl 34(2):1200–1209
Kumar A, Daumé H (2011) A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 393–400
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788
Logesh R, Subramaniyaswamy V, Malathi D, Sivaramakrishnan N, Vijayakumar V (2018) Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3891-5
MacQueen J et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, CA, USA, pp 281–297
Nataliani Y, Yang MS (2017) Powered Gaussian kernel spectral clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3036-2
Qian Y, Liang J, Wu W, Dang C (2011) Information granularity in fuzzy binary GRC model. IEEE Trans Fuzzy Syst 19(2):253–264
Ren Y, Domeniconi C, Zhang G, Yu G (2014) A weighted adaptive mean shift clustering algorithm. In: Proceedings of the 2014 SIAM international conference on data mining. SIAM, pp 794–802
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Saha S, Das R (2018) Exploring differential evolution and particle swarm optimization to develop some symmetry-based automatic clustering techniques: application to gene clustering. Neural Comput Appl 30(3):735–757. https://doi.org/10.1007/s00521-016-2710-0
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Tassa T, Cohen DJ (2013) Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans Knowl Data Eng 25(2):311–324
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Wang J, Fl Chung, Wang S, Deng Z (2014) Double indices-induced fcm clustering and its integration with fuzzy subspace clustering. Pattern Anal Appl 17(3):549–566
Wang Y, Ru Y, Chai J (2018) Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3731-7
Wu WZ, Leung Y, Mi JS (2009) Granular computing and knowledge reduction in formal contexts. IEEE Trans Knowl Data Eng 21(10):1461–1474
Xia H, Zhuang J, Yu D (2013) Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data. Pattern Recogn 46(9):2562–2575
Zhang H, Chow TW, Wu QJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550
Zhang H, Wu QJ, Chow TW, Zhao M (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recogn 45(5):1866–1876
Zhang X (2017) Data clustering. Science Press, Beijing
Zhang X, Jing L, Hu X, Ng M, Jiangxi JX, Zhou X (2008) Medical document clustering using ontology-based term similarity measures. Int J Data Warehous Min: IJDWM 4(1):62–73
Zhao M, Zhang H, Cheng W, Zhang Z (2016) Joint l p-and l 2, p-norm minimization for subspace clustering with outlier pursuit. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 3658–3665
Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, London
Zong L, Zhang X, Zhao L, Yu H, Zhao Q (2017) Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Netw 88:74–89
Acknowledgements
This work was supported by National Key Research and Development Program of China (Nos.2017YFB1300200, 2017YFB1300203), National Natural Science Fund of China (Nos. 61672130, 61602082, 61627808, 91648205), the Open Program of State Key Laboratory of Software Architecture (No. SKLSAOP1701), MOE Research Center for Online Education of China (No. 2016YB121), LiaoNing Revitalization Talents Program (No. XLYC1806006), the Fundamental Research Funds for the Central Universities (Nos. DUT19RC(3)012, DUT17RC(3)071) and the development of science and technology of Guangdong province special fund project (No. 2016B090910001). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, S., Feng, L., Liu, S. et al. Multi-feature weighting neighborhood density clustering. Neural Comput & Applic 32, 9545–9565 (2020). https://doi.org/10.1007/s00521-019-04467-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04467-4