On Consensus Clustering Validation
Work on clustering combination has shown that clustering combination methods typically outperform single runs of clustering algorithms. While there is much work reported in the literature on validating data partitions produced by the traditional clustering algorithms, little has been done in order to validate data partitions produced by clustering combination methods. We propose to assess the quality of a consensus partition using a pattern pairwise similarity induced from the set of data partitions that constitutes the clustering ensemble. A new validity index based on the likelihood of the data set given a data partition, and three modified versions of well-known clustering validity indices are proposed. The validity measures on the original, clustering ensemble, and similarity spaces are analysed and compared based on experimental results on several synthetic and real data sets.
KeywordsCluster Solution Validity Index Pairwise Similarity Data Partition Cluster Ensemble
- 4.Duarte, F.J., Duarte, J.M.M., Rodrigues, M.F.C., Fred, A.L.N.: Cluster ensemble selection using average cluster consistency. In: KDIR 2009: Proc. of Int. Conf. on Knowledge Discovery and Information Retrieval (October 2009)Google Scholar
- 7.Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transaction on Pattern Analysis and Machine Intelligence 1(2) (1979)Google Scholar
- 8.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (November 2000)Google Scholar