Abstract
We consider a well-known strongly NP-hard K-Means problem. In this problem, one needs to partition a finite set of N points in Euclidean space into K non-empty clusters minimizing the sum over all clusters of the intracluster sums of the squared distances between the elements of each cluster and its centers. The cluster’s center is defined as the centroid (geometrical center). We analyze the polynomial-solvable one-dimensional case of the problem and propose a novel parameterized approach to this case. Within the framework of this approach, we, firstly, introduce a new parameterized formulation of the problem for this case and, secondly, we show that our approach and proposed algorithm allows one to find an optimal input data partition and, contrary to existing approaches and algorithms, simultaneously find an optimal clusters number in \(\mathcal {O}(N)\) time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoidiagrams and randomization to variance-based clustering. In: Proceedings of Annual Symposium on Computational Geometry, pp. 332–339 (1994)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Mahajana, M., Nimbhorkar, P., Varadarajan, K.: The planar \(k\)-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)
Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S., Schneider, S., Song, M.: Fast exact \(k\)-means, \(k\)-medians and Bregman divergence clustering in 1D. CoRR arXiv:1701.07204 (2017)
Xiaolin, W.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)
Glebov, N.I.: On the convex sequences. Discrete Anal. 4, 10–22 (1965). (In Russian)
Gimadutdinov, E.K.: On the properties of solutions of one location problem of points on a segment. Control. Syst. 2, 77–91 (1969). (In Russian)
Gimadutdinov, E.K.: On one class of nonlinear programming problems. Control. Syst. 3, 101–113 (1969). (In Russian)
Gimadutdinov, E.K.: Some standartization problems with arbitrary sign instances and coherent, quasi-convex and almost quasi-convex matrices. Control. Syst. 27, 3–11 (1987). (In Russian)
Acknowledgments
The study was supported by the Russian Foundation for Basic Research, projects 19-01-00308, 19-07-00397, and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2019-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kel’manov, A., Khandeev, V. (2020). Exact Linear-Time Algorithm for Parameterized K-Means Problem with Optimized Number of Clusters in the 1D Case. In: Sergeyev, Y., Kvasov, D. (eds) Numerical Computations: Theory and Algorithms. NUMTA 2019. Lecture Notes in Computer Science(), vol 11974. Springer, Cham. https://doi.org/10.1007/978-3-030-40616-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-40616-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40615-8
Online ISBN: 978-3-030-40616-5
eBook Packages: Computer ScienceComputer Science (R0)