A Signal-to-Noise Ratio Based Optimization Approach for Data Cluster Analysis
There are many cluster analysis problems in the context of multi-criteria decision analysis. These problems often need to simultaneously determine the number of clusters and their boundaries. There is no good method available to automatically determine the number of clusters. In this paper, we propose a simple and intuitive approach to address this issue. The proposed approach first aggregates a set of multi-criteria or multi-attribute data into a one-dimensional data set. Then, we consider an arbitrary data point, which divides the dataset into two groups. The between-groups distance and within-group variances are combined into a clustering quality measure called the signal-to-noise ratio (SNR). The plot of SNR versus each data point provides the clue about the number of clusters and their boundaries. Specifically, the cluster boundaries are at the local maxima of the plot; and this also simultaneously determines the number of clusters. The proposed approach can be conveniently implemented using an Excel spreadsheet program. Two real-world examples are included to illustrate the appropriateness of the proposed approach. The results are also validated through comparing them with the results obtained from the Gaussian kernel density estimation.
KeywordsMulti-criteria decision Cluster analysis Signal-to-noise ratio Kernel density estimation
The research was supported by the National Natural Science Foundation of China (No. 71771029).
- 2.Jiang R (2009) Cluster analysis of maintenance management problems. In: 2009 IEEM. Hong Kong, pp 1150–1154Google Scholar
- 6.Wikipedia. Kernel density estimation. https://en.wikipedia.org/wiki/Kernel_density_estimation, last accessed 14 Dec 2015