Advertisement

Exact Linear-Time Algorithm for Parameterized K-Means Problem with Optimized Number of Clusters in the 1D Case

  • Alexander Kel’manov
  • Vladimir KhandeevEmail author
Conference paper
  • 11 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11974)

Abstract

We consider a well-known strongly NP-hard K-Means problem. In this problem, one needs to partition a finite set of N points in Euclidean space into K non-empty clusters minimizing the sum over all clusters of the intracluster sums of the squared distances between the elements of each cluster and its centers. The cluster’s center is defined as the centroid (geometrical center). We analyze the polynomial-solvable one-dimensional case of the problem and propose a novel parameterized approach to this case. Within the framework of this approach, we, firstly, introduce a new parameterized formulation of the problem for this case and, secondly, we show that our approach and proposed algorithm allows one to find an optimal input data partition and, contrary to existing approaches and algorithms, simultaneously find an optimal clusters number in \(\mathcal {O}(N)\) time.

Keywords

K-Means One-dimensional case Parameterized approach Linear-time algorithm 

Notes

Acknowledgments

The study was supported by the Russian Foundation for Basic Research, projects 19-01-00308, 19-07-00397, and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2019-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

References

  1. 1.
    Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)Google Scholar
  2. 2.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  3. 3.
    Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoidiagrams and randomization to variance-based clustering. In: Proceedings of Annual Symposium on Computational Geometry, pp. 332–339 (1994)Google Scholar
  4. 4.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)CrossRefGoogle Scholar
  5. 5.
    Mahajana, M., Nimbhorkar, P., Varadarajan, K.: The planar \(k\)-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)CrossRefGoogle Scholar
  7. 7.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  8. 8.
    Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S., Schneider, S., Song, M.: Fast exact \(k\)-means, \(k\)-medians and Bregman divergence clustering in 1D. CoRR arXiv:1701.07204 (2017)
  9. 9.
    Xiaolin, W.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Glebov, N.I.: On the convex sequences. Discrete Anal. 4, 10–22 (1965). (In Russian)MathSciNetGoogle Scholar
  11. 11.
    Gimadutdinov, E.K.: On the properties of solutions of one location problem of points on a segment. Control. Syst. 2, 77–91 (1969). (In Russian)MathSciNetGoogle Scholar
  12. 12.
    Gimadutdinov, E.K.: On one class of nonlinear programming problems. Control. Syst. 3, 101–113 (1969). (In Russian)Google Scholar
  13. 13.
    Gimadutdinov, E.K.: Some standartization problems with arbitrary sign instances and coherent, quasi-convex and almost quasi-convex matrices. Control. Syst. 27, 3–11 (1987). (In Russian)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Sobolev Institute of MathematicsNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations