A Novel Method for Identifying Optimal Number of Clusters with Marginal Differential Entropy
- 1.3k Downloads
Clustering evaluation plays an important role in clustering algorithms. Most of recent approaches about clustering that evaluate and identify the optimal number of clusters need to calculate the distances between data points pair-wisely or evaluate the entropy in the entire dimension space and have high computational complexity. In this paper, we propose an entropy-based clustering evaluation method for identifying the optimal number of clusters which first projects the clusters centroids to each of its individual dimensions, then accumulates the marginal differential entropy in each dimension. With the sum of marginal entropies we can analyze the performance and identify the optimal number of clusters. This method can dramatically reduce the computational complexity without losing accuracy. Experiment results show that the proposed method has high stability under various situations and can apply to massive high-dimensional data points.
KeywordsClustering Evaluation Information Theory Differential Entropy
Unable to display preview. Download preview PDF.
- 5.Richards, J.A.: Remote sensing digital image analysis. Springer (2012)Google Scholar
- 7.Singhal, A.: Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), 35–43 (2001)Google Scholar
- 9.Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2), 224–227 (1979)Google Scholar
- 13.Chen, K., Liu, L.: The” best k” for entropy-based categorical data clustering. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, pp. 253–262. Lawrence Berkeley Laboratory (2005)Google Scholar
- 14.Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)Google Scholar
- 16.Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, vol. 27. ACM (1998)Google Scholar