Skip to main content

Bregman Clustering for Separable Instances

  • Conference paper
Algorithm Theory - SWAT 2010 (SWAT 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6139))

Included in the following conference series:

Abstract

The Bregman k-median problem is defined as follows. Given a Bregman divergence D φ and a finite set \(P \subseteq {\mathbb R}^d\) of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑  p ∈ P min c ∈ C D φ (p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA ’07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability \(2^{-{\mathcal O}(k)}\) this seeding step alone finds an \({\mathcal O}(1)\)-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS ’06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most \(2^{{\mathcal O}(k)}n\) arithmetic operations, including evaluations of Bregman divergence D φ .

Research supported by Deutsche Forschungsgemeinschaft (DFG), grant BL-314/6-1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC ’98), pp. 106–113 (1998)

    Google Scholar 

  2. Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the Euclidean κ-median problem. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 378–389. Springer, Heidelberg (1999)

    Google Scholar 

  3. Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02), pp. 250–257. Association for Computing Machinery (2002)

    Google Scholar 

  4. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04), pp. 291–300. Association for Computing Machinery (2004)

    Google Scholar 

  5. Kumar, A., Sabharwal, Y., Sen, S.: Linear time algorithms for clustering problems in any dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1374–1385. Springer, Heidelberg (2005)

    Google Scholar 

  6. Chen, K.: On k-median clustering in high dimensions. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’06), pp. 1177–1185. Society for Industrial and Applied Mathematics (2006)

    Google Scholar 

  7. Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)

    MATH  MathSciNet  Google Scholar 

  8. Fernandez de la Vega, W., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), pp. 50–58. Association for Computing Machinery (2003)

    Google Scholar 

  9. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’04), pp. 454–462. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  10. Chen, K.: On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing 39(3), 923–947 (2009)

    Article  MathSciNet  Google Scholar 

  11. Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd ACM Symposium on Computational Geometry (SCG ’07), pp. 11–18. Association for Computing Machinery (2007)

    Google Scholar 

  12. Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’08), pp. 799–808. Society for Industrial and Applied Mathematics (2008); Full version to appear in ACM Transactions on Algorithms (special issue on SODA ’08).

    Google Scholar 

  13. Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09), pp. 1088–1097. Society for Industrial and Applied Mathematics (2009)

    Google Scholar 

  14. Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  15. Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS ’09). IEEE Computer Society Press, Los Alamitos (2009) (to appear)

    Google Scholar 

  16. Vattani, A.: k-means requires exponetially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry (SCG ’09), pp. 324–332. Association for Computing Machinery (2009)

    Google Scholar 

  17. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07), pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  18. Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS ’06), pp. 165–176. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  19. Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Proceedings of the 12th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX ’09), pp. 15–28. Springer, Heidelberg (2009)

    Google Scholar 

  20. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)

    MathSciNet  Google Scholar 

  21. Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory 51(7), 2664–2669 (2005)

    Article  MathSciNet  Google Scholar 

  22. Manthey, B., Röglin, H.: Worst-case and smoothed analysis of k-means clustering with Bregman divergences. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1024–1033. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation guarantees. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 154–169. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Sra, S., Jegelka, S., Banerjee, A.: Approximation algorithms for Bregman clustering, co-clustering and tensor clustering. Technical Report MPIK-TR-177, Max Planck Institure for Biological Cybernetics (2008)

    Google Scholar 

  25. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)

    Article  Google Scholar 

  26. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific Symposium on Biocomputing (PSB ’02), pp. 6–17. World Scientific, Singapore (2002)

    Google Scholar 

  27. Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967)

    Article  Google Scholar 

  28. Mahalanobis, P.C.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2(1), pp. 49–55. Indian National Science Academy (1936)

    Google Scholar 

  29. Ackermann, M.R.: Algorithms for the Bregman k-Median Problem. PhD thesis, University of Paderborn, Department of Computer Science (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ackermann, M.R., Blömer, J. (2010). Bregman Clustering for Separable Instances. In: Kaplan, H. (eds) Algorithm Theory - SWAT 2010. SWAT 2010. Lecture Notes in Computer Science, vol 6139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13731-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13731-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13730-3

  • Online ISBN: 978-3-642-13731-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics