Bregman Clustering for Separable Instances

Ackermann, Marcel R.; Blömer, Johannes

doi:10.1007/978-3-642-13731-0_21

Marcel R. Ackermann¹⁷ &
Johannes Blömer¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6139))

Included in the following conference series:

Scandinavian Workshop on Algorithm Theory

1164 Accesses
7 Citations

Abstract

The Bregman k-median problem is defined as follows. Given a Bregman divergence D_φ and a finite set \(P \subseteq {\mathbb R}^d\) of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ _p ∈ P min _c ∈ C D_φ(p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA ’07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability \(2^{-{\mathcal O}(k)}\) this seeding step alone finds an \({\mathcal O}(1)\)-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS ’06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most \(2^{{\mathcal O}(k)}n\) arithmetic operations, including evaluations of Bregman divergence D_φ.

Research supported by Deutsche Forschungsgemeinschaft (DFG), grant BL-314/6-1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC ’98), pp. 106–113 (1998)
Google Scholar
Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the Euclidean κ-median problem. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 378–389. Springer, Heidelberg (1999)
Google Scholar
Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02), pp. 250–257. Association for Computing Machinery (2002)
Google Scholar
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04), pp. 291–300. Association for Computing Machinery (2004)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: Linear time algorithms for clustering problems in any dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1374–1385. Springer, Heidelberg (2005)
Google Scholar
Chen, K.: On k-median clustering in high dimensions. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’06), pp. 1177–1185. Society for Industrial and Applied Mathematics (2006)
Google Scholar
Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)
MATH MathSciNet Google Scholar
Fernandez de la Vega, W., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), pp. 50–58. Association for Computing Machinery (2003)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’04), pp. 454–462. IEEE Computer Society, Los Alamitos (2004)
Chapter Google Scholar
Chen, K.: On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing 39(3), 923–947 (2009)
Article MathSciNet Google Scholar
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd ACM Symposium on Computational Geometry (SCG ’07), pp. 11–18. Association for Computing Machinery (2007)
Google Scholar
Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’08), pp. 799–808. Society for Industrial and Applied Mathematics (2008); Full version to appear in ACM Transactions on Algorithms (special issue on SODA ’08).
Google Scholar
Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09), pp. 1088–1097. Society for Industrial and Applied Mathematics (2009)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Article MATH MathSciNet Google Scholar
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS ’09). IEEE Computer Society Press, Los Alamitos (2009) (to appear)
Google Scholar
Vattani, A.: k-means requires exponetially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry (SCG ’09), pp. 324–332. Association for Computing Machinery (2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07), pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Google Scholar
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS ’06), pp. 165–176. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Proceedings of the 12th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX ’09), pp. 15–28. Springer, Heidelberg (2009)
Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
MathSciNet Google Scholar
Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory 51(7), 2664–2669 (2005)
Article MathSciNet Google Scholar
Manthey, B., Röglin, H.: Worst-case and smoothed analysis of k-means clustering with Bregman divergences. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1024–1033. Springer, Heidelberg (2009)
Chapter Google Scholar
Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation guarantees. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 154–169. Springer, Heidelberg (2008)
Chapter Google Scholar
Sra, S., Jegelka, S., Banerjee, A.: Approximation algorithms for Bregman clustering, co-clustering and tensor clustering. Technical Report MPIK-TR-177, Max Planck Institure for Biological Cybernetics (2008)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Article Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific Symposium on Biocomputing (PSB ’02), pp. 6–17. World Scientific, Singapore (2002)
Google Scholar
Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967)
Article Google Scholar
Mahalanobis, P.C.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2(1), pp. 49–55. Indian National Science Academy (1936)
Google Scholar
Ackermann, M.R.: Algorithms for the Bregman k-Median Problem. PhD thesis, University of Paderborn, Department of Computer Science (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Paderborn, Germany
Marcel R. Ackermann & Johannes Blömer

Authors

Marcel R. Ackermann
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Blömer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Haim Kaplan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ackermann, M.R., Blömer, J. (2010). Bregman Clustering for Separable Instances. In: Kaplan, H. (eds) Algorithm Theory - SWAT 2010. SWAT 2010. Lecture Notes in Computer Science, vol 6139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13731-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-13731-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13730-3
Online ISBN: 978-3-642-13731-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics