Skip to main content

Selecting External Validation Measures for K-means Clustering

  • Chapter
  • First Online:
Advances in K-means Clustering

Part of the book series: Springer Theses ((Springer Theses))

  • 4530 Accesses

Abstract

Cluster validity is a long standing challenge in the clustering literature. While many evaluation measures have been developed for cluster validity, these measures often provide inconsistent information about the clustering performance, and the best suitable measures to use remain unclear in practice. Our study in this chapter fills this crucial void by giving an organized study of sixteen external validation measures for K-means clustering. Specifically, we first propose a filtering criterion based on the uniform effect of K-means, and apply it for the identification of defective measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://trec.nist.gov

  2. 2.

    http://www.mathworks.cn/help/toolbox/stats/kmeans.html

  3. 3.

    http://glaros.dtc.umn.edu/gkhome/views/cluto

References

  • Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. Methods Mol. Biol. 224 (2003), 159–182 (2003)

    Google Scholar 

  • Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.: Model-based evaluation of clustering validation measures. Pattern Recognit. 40 , 807–824 (2007)

    Article  MATH  Google Scholar 

  • Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35 (2), 137–154 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley-Interscience, New York (2006)

    MATH  Google Scholar 

  • Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)

    Google Scholar 

  • van Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. Technical report, Amsterdam, The Netherlands (2000)

    Google Scholar 

  • Fowlkes, E., Mallows, C.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc 78 , 553–569 (1983)

    Article  MATH  Google Scholar 

  • Goodman, L., Kruskal, W.: Measures of association for cross classification. J. Am. Stat. Assoc 49 , 732–764 (1954)

    MATH  Google Scholar 

  • Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Rec. 31 (2), 40–45 (2002)

    Article  Google Scholar 

  • Hubert, L.: Nominal scale response agreement as a generalized correlation. Br. J. Math. Stat. Psychol. 30 , 98–103 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 , 193–218 (1985)

    Article  Google Scholar 

  • Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  • Kendall, M.: Rank Correlation Methods. Hafner Publishing Company, New York (1955)

    MATH  Google Scholar 

  • Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

    Google Scholar 

  • MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  • Meila, M.: Comparing clusterings by the variation of information. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 173–187 (2003)

    Google Scholar 

  • Meila, M.: Comparing clusterings–an axiomatic view. In: Proceedings of the 22nd International Conference on Machine learning, pp. 577–584 (2005)

    Google Scholar 

  • Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht (1996)

    Book  MATH  Google Scholar 

  • Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 , 846–850 (1971)

    Article  Google Scholar 

  • Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    Google Scholar 

  • Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)

    Google Scholar 

  • Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI Workshop on AI for Web Search (2000)

    Google Scholar 

  • Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–886. New York, NY, USA (2009)

    Google Scholar 

  • Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Mach. Learn. 55 (3), 311–331 (2004)

    Article  MATH  Google Scholar 

  • Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8 (3), 374–384 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wu, J. (2012). Selecting External Validation Measures for K-means Clustering. In: Advances in K-means Clustering. Springer Theses. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29807-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29807-3_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29806-6

  • Online ISBN: 978-3-642-29807-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics