A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering

Ben-David, Shai

doi:10.1007/978-3-540-27819-1_29

Shai Ben-David^20,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

International Conference on Computational Learning Theory

2163 Accesses
5 Citations

Abstract

We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling – based approximate clustering scenario. As a corollary, we get a sampling-based algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Bartlett, P., Linder, T., Lugosi, G.: The minimax distortion Redundancy in empirical Quantizer Design. IEEE Transactions on Information theory 44, 1802–1813 (1998)
Article MATH MathSciNet Google Scholar
Buhmann, J.: Empirical Risk Approximation: An Induction Principle for Unsupervised Learning . Technical Report IAI-TR-98-3, Institut for Informatik III, Universitat Bonn (1998)
Google Scholar
Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median Algorithm with Running Time Independent of Data Size. Journal of Machine Learning, Special Issue on Theoretical Advances in Data Clustering, MLJ (2004)
Google Scholar
Mishra, N., Oblinger, D., Pitt, L.: Sublinear Time Approximate Clustering. In: Proceedings of Syposium on Discrete Algorithms, SODA, pp. 439–447 (2001)
Google Scholar
Pollard, D.: Quantization and the method of k-means. IEEE Transactions on Information theory 28, 199–205 (1982)
Article MATH MathSciNet Google Scholar
Smola, A.J., Mika, S., Scholkopf, B.: Quantization Finctionals and Regularized Principal Manifolds. NeuroCOLT Technical Report Series NC2-TR-1998-028
Google Scholar
de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation Schemes for Clustering Problems. In: Proceedings of Symposium on the Theory of computation, STOC 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Technion, Haifa, 32000, Israel
Shai Ben-David
School of ECE, Cornell university, Ithaca, 14853, NY
Shai Ben-David

Authors

Shai Ben-David
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London
John Shawe-Taylor
Google, 1600 Amphitheater Parkway, CA 94043, Mountain View, USA
Yoram Singer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben-David, S. (2004). A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-27819-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics