Skip to main content

A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering

  • Conference paper
Learning Theory (COLT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

Abstract

We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling – based approximate clustering scenario. As a corollary, we get a sampling-based algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  2. Bartlett, P., Linder, T., Lugosi, G.: The minimax distortion Redundancy in empirical Quantizer Design. IEEE Transactions on Information theory 44, 1802–1813 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  3. Buhmann, J.: Empirical Risk Approximation: An Induction Principle for Unsupervised Learning . Technical Report IAI-TR-98-3, Institut for Informatik III, Universitat Bonn (1998)

    Google Scholar 

  4. Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median Algorithm with Running Time Independent of Data Size. Journal of Machine Learning, Special Issue on Theoretical Advances in Data Clustering, MLJ (2004)

    Google Scholar 

  5. Mishra, N., Oblinger, D., Pitt, L.: Sublinear Time Approximate Clustering. In: Proceedings of Syposium on Discrete Algorithms, SODA, pp. 439–447 (2001)

    Google Scholar 

  6. Pollard, D.: Quantization and the method of k-means. IEEE Transactions on Information theory 28, 199–205 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  7. Smola, A.J., Mika, S., Scholkopf, B.: Quantization Finctionals and Regularized Principal Manifolds. NeuroCOLT Technical Report Series NC2-TR-1998-028

    Google Scholar 

  8. de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation Schemes for Clustering Problems. In: Proceedings of Symposium on the Theory of computation, STOC 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ben-David, S. (2004). A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27819-1_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22282-8

  • Online ISBN: 978-3-540-27819-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics