Abstract
Cluster validity is a long standing challenge in the clustering literature. While many evaluation measures have been developed for cluster validity, these measures often provide inconsistent information about the clustering performance, and the best suitable measures to use remain unclear in practice. Our study in this chapter fills this crucial void by giving an organized study of sixteen external validation measures for K-means clustering. Specifically, we first propose a filtering criterion based on the uniform effect of K-means, and apply it for the identification of defective measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. Methods Mol. Biol. 224 (2003), 159–182 (2003)
Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.: Model-based evaluation of clustering validation measures. Pattern Recognit. 40 , 807–824 (2007)
Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35 (2), 137–154 (2000)
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley-Interscience, New York (2006)
Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
van Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. Technical report, Amsterdam, The Netherlands (2000)
Fowlkes, E., Mallows, C.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc 78 , 553–569 (1983)
Goodman, L., Kruskal, W.: Measures of association for cross classification. J. Am. Stat. Assoc 49 , 732–764 (1954)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Rec. 31 (2), 40–45 (2002)
Hubert, L.: Nominal scale response agreement as a generalized correlation. Br. J. Math. Stat. Psychol. 30 , 98–103 (1977)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 , 193–218 (1985)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Kendall, M.: Rank Correlation Methods. Hafner Publishing Company, New York (1955)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Meila, M.: Comparing clusterings by the variation of information. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 173–187 (2003)
Meila, M.: Comparing clusterings–an axiomatic view. In: Proceedings of the 22nd International Conference on Machine learning, pp. 577–584 (2005)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht (1996)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 , 846–850 (1971)
Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI Workshop on AI for Web Search (2000)
Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–886. New York, NY, USA (2009)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Mach. Learn. 55 (3), 311–331 (2004)
Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8 (3), 374–384 (2005)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wu, J. (2012). Selecting External Validation Measures for K-means Clustering. In: Advances in K-means Clustering. Springer Theses. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29807-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-29807-3_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29806-6
Online ISBN: 978-3-642-29807-3
eBook Packages: Computer ScienceComputer Science (R0)