Abstract

Work on clustering combination has shown that clustering combination methods typically outperform single runs of clustering algorithms. While there is much work reported in the literature on validating data partitions produced by the traditional clustering algorithms, little has been done in order to validate data partitions produced by clustering combination methods. We propose to assess the quality of a consensus partition using a pattern pairwise similarity induced from the set of data partitions that constitutes the clustering ensemble. A new validity index based on the likelihood of the data set given a data partition, and three modified versions of well-known clustering validity indices are proposed. The validity measures on the original, clustering ensemble, and similarity spaces are analysed and compared based on experimental results on several synthetic and real data sets.

Keywords

Cluster Solution Validity Index Pairwise Similarity Data Partition Cluster Ensemble 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Fred, A., Jain, A.: Combining multiple clustering using evidence accumulation. IEEE Trans. Pattern Analysis and Machine Intelligence 27(6), 835–850 (2005)CrossRefGoogle Scholar
  2. 2.
    Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Intelligent Information Systems Journal 17(2-3), 107–145 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Duarte, F.J., Duarte, J.M.M., Rodrigues, M.F.C., Fred, A.L.N.: Cluster ensemble selection using average cluster consistency. In: KDIR 2009: Proc. of Int. Conf. on Knowledge Discovery and Information Retrieval (October 2009)Google Scholar
  5. 5.
    Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)MATHCrossRefGoogle Scholar
  6. 6.
    Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact, well separated clusters. Cybernetics and Systems 3(3), 32–57 (1974)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transaction on Pattern Analysis and Machine Intelligence 1(2) (1979)Google Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (November 2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • João M. M. Duarte
    • 1
    • 2
  • Ana L. N. Fred
    • 1
  • André Lourenço
    • 1
  • F. Jorge F. Duarte
    • 2
  1. 1.Instituto de TelecomunicaçõesInstituto Superior TécnicoLisboaPortugal
  2. 2.GECAD - Knowledge Engineering and Decision Support GroupInstituto Superior de Engenharia do PortoPortoPortugal

Personalised recommendations