Performance Guarantees for Hierarchical Clustering

  • Sanjoy Dasgupta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2375)


We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to common heuristics for hierarchical clustering, and we show that these heuristics have poorer approximation factors.


Hierarchical Cluster Cluster Center Approximation Ratio Parent Function Performance Guarantee 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alizadeh, A. A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511.CrossRefGoogle Scholar
  2. 2.
    Borodin, A., Ostrovsky, R., & Rabani Y. (1999) Subquadratic approximation algorithms for clustering problems in high dimensional spaces. ACM Symposium on Theory of Computing.Google Scholar
  3. 3.
    Charikar, M. & Guha, S. (1999) Improved combinatorial algorithms for facility location and k-median problems. IEEE Foundations of Computer Science.Google Scholar
  4. 4.
    Dasgupta, S. & Schulman, L. J. (2000) A two-round variant of EM for Gaussian mixtures. Uncertainty in Artificial Intelligence.Google Scholar
  5. 5.
    Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95:14863–14868.CrossRefGoogle Scholar
  6. 6.
    Feder, T. & Greene, D. (1988) Optimal algorithms for approximate clustering. ACM Symposium on Theory of Computing.Google Scholar
  7. 7.
    González, T. F. (1985) Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306.zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Hartigan, J. A. (1985) Statistical theory in clustering. Journal of Classification, 2:63–76.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Hochbaum, D. & Shmoys, D. (1985) A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180–184.zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Kearns, M., Mansour, Y. & Ng, A. (1997) An information-theoretic analysis of hard and soft assignment methods for clustering. Uncertainty in Artificial Intelligence.Google Scholar
  11. 11.
    Thorup, M. (2001) Quick k-median, k-center, and facility location for sparse graphs. International Colloquium on Automata, Languages, and Programming.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Sanjoy Dasgupta
    • 1
  1. 1.University of CaliforniaBerkeley

Personalised recommendations