Abstract
We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling – based approximate clustering scenario. As a corollary, we get a sampling-based algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Bartlett, P., Linder, T., Lugosi, G.: The minimax distortion Redundancy in empirical Quantizer Design. IEEE Transactions on Information theory 44, 1802–1813 (1998)
Buhmann, J.: Empirical Risk Approximation: An Induction Principle for Unsupervised Learning . Technical Report IAI-TR-98-3, Institut for Informatik III, Universitat Bonn (1998)
Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median Algorithm with Running Time Independent of Data Size. Journal of Machine Learning, Special Issue on Theoretical Advances in Data Clustering, MLJ (2004)
Mishra, N., Oblinger, D., Pitt, L.: Sublinear Time Approximate Clustering. In: Proceedings of Syposium on Discrete Algorithms, SODA, pp. 439–447 (2001)
Pollard, D.: Quantization and the method of k-means. IEEE Transactions on Information theory 28, 199–205 (1982)
Smola, A.J., Mika, S., Scholkopf, B.: Quantization Finctionals and Regularized Principal Manifolds. NeuroCOLT Technical Report Series NC2-TR-1998-028
de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation Schemes for Clustering Problems. In: Proceedings of Symposium on the Theory of computation, STOC 2003 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben-David, S. (2004). A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive