Performance Guarantees for Hierarchical Clustering
We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to common heuristics for hierarchical clustering, and we show that these heuristics have poorer approximation factors.
KeywordsHierarchical Cluster Cluster Center Approximation Ratio Parent Function Performance Guarantee
Unable to display preview. Download preview PDF.
- 2.Borodin, A., Ostrovsky, R., & Rabani Y. (1999) Subquadratic approximation algorithms for clustering problems in high dimensional spaces. ACM Symposium on Theory of Computing.Google Scholar
- 3.Charikar, M. & Guha, S. (1999) Improved combinatorial algorithms for facility location and k-median problems. IEEE Foundations of Computer Science.Google Scholar
- 4.Dasgupta, S. & Schulman, L. J. (2000) A two-round variant of EM for Gaussian mixtures. Uncertainty in Artificial Intelligence.Google Scholar
- 6.Feder, T. & Greene, D. (1988) Optimal algorithms for approximate clustering. ACM Symposium on Theory of Computing.Google Scholar
- 10.Kearns, M., Mansour, Y. & Ng, A. (1997) An information-theoretic analysis of hard and soft assignment methods for clustering. Uncertainty in Artificial Intelligence.Google Scholar
- 11.Thorup, M. (2001) Quick k-median, k-center, and facility location for sparse graphs. International Colloquium on Automata, Languages, and Programming.Google Scholar