Advertisement

An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem

  • Alexander Kel’manov
  • Sergey Khamidullin
  • Vladimir KhandeevEmail author
  • Artem Pyatkin
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 974)

Abstract

A problem of partitioning a finite sequence of points in Euclidean space into two subsequences (clusters) maximizing the size of the first cluster subject to two constraints is considered. The first constraint deals with every two consecutive indices of elements of the first cluster: the difference between them is bounded from above and below by some constants. The second one restricts the value of a quadratic clustering function that is the sum of the intracluster sums over both clusters. The intracluster sum is the sum of squared distances between cluster elements and the cluster center. The center of the first cluster is unknown and determined as the centroid (i.e. as the mean value of its elements), while the center of the second one is zero.

The strong NP-hardness of the problem is shown and an exact algorithm is suggested for the case of integer coordinates of input points. If the space dimension is bounded by some constant this algorithm runs in a pseudopolynomial time.

Keywords

Euclidean space Sequence 2-partition Longest subsequence Quadratic variance NP-hard problem Integer coordinates Exact algorithm Fixed space dimension Pseudopolynomial running time 

Notes

Acknowledgments

The study presented in Sects. 3 and 5 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 2 and 4 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

References

  1. 1.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  2. 2.
    Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)CrossRefGoogle Scholar
  3. 3.
    Hansen, P., Jaumard, B., Mladenovich, N.: Minimum sum of squares clustering in a low dimensional space. J. Classifi. 15, 37–55 (1998)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)zbMATHGoogle Scholar
  6. 6.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)CrossRefGoogle Scholar
  7. 7.
    Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)CrossRefGoogle Scholar
  8. 8.
    Dolgushev, A.V., Kel’manov, A.V.: On the algorithmic complexity of a problem in cluster analysis. J. Appl. Ind. Math. 5(2), 191–194 (2011)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Kel’manov, A.V., Khamidullin, S.A.: An approximating polynomial algorithm for a sequence partitioning problem. J. Appl. Ind. Math. 8(2), 236–244 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: Exact pseudopolynomial algorithm for one sequence partitioning problem. Autom. Remote Control. 78(1), 66–73 (2017)CrossRefGoogle Scholar
  12. 12.
    Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A fully polynomial-time approximation scheme for a sequence 2-cluster partitioning problem. J. Appl. Ind. Math. 10(2), 209–219 (2016)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A randomized algorithm for a sequence 2-clustering problem. Comput. Math. Math. Phys. 58(12) (2018, in publishing)Google Scholar
  14. 14.
    Kel’manov, A., Khamidullin, S., Khandeev, V.: A randomized algorithm for 2-partition of a sequence. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 313–322. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73013-4_29CrossRefGoogle Scholar
  15. 15.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  16. 16.
    James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013).  https://doi.org/10.1007/978-1-4614-7138-7CrossRefzbMATHGoogle Scholar
  17. 17.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009).  https://doi.org/10.1007/978-0-387-84858-7CrossRefzbMATHGoogle Scholar
  18. 18.
    Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-14142-8CrossRefzbMATHGoogle Scholar
  19. 19.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). The MIT Press, Cambridge (2017)zbMATHGoogle Scholar
  20. 20.
    Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09156-3_49CrossRefGoogle Scholar
  21. 21.
    Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  22. 22.
    Pach, J., Agarwal, P.K.: Combinatorial Geometry. Wiley, New York (1995)CrossRefGoogle Scholar
  23. 23.
    Fu, T.-C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  24. 24.
    Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. Remote Sensing and Digital Image Processing, vol. 22. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15967-6CrossRefGoogle Scholar
  25. 25.
    Liao, T.W.: Clustering of time series data – a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefGoogle Scholar
  26. 26.
    Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Dokl. Math. 78(1), 574–575 (2008)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Sib. J. Ind. Math. 9 (1(25)), 55–74 (2006). (in Russian)Google Scholar
  30. 30.
    Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recognit. Image Anal. 18(1), 30–42 (2008)CrossRefGoogle Scholar
  31. 31.
    Baburin, A.E., Gimadi, E.Kh., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Gimadi, E.Kh., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(1), 48–53 (2010)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Proc. Steklov Inst. Math. 295(Suppl. 1), 47–56 (2016)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem. J. Appl. Ind. Math. 56(2), 334–341 (2016)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73013-4_30CrossRefGoogle Scholar
  39. 39.
    Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Kel’manov, A.V., Khandeev, V.I., Panasenko A.V.: Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 294–304. Springer, Cham (2018)Google Scholar
  41. 41.
    Kel’manov, A.V., Khandeev, V.I.: Panasenko A.V.: Exact algorithms for two hard to solve 2-clustering problems. Pattern Recognit. Image Anal. 27(4) (2018, in publishing)Google Scholar
  42. 42.
    Kel’manov, A.V., Pyatkin, A.V.: On complexity of some problems of cluster analysis of vector sequences. J. Appl. Ind. Math. 7(3), 363–369 (2013)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Kel’manov, A.V., Khamidullin, S.A.: An approximation polynomial-time algorithm for a sequence Bi-clustering problem. Comput. Math. Math. Phys. 55(6), 1068–1076 (2015)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)zbMATHGoogle Scholar
  46. 46.
    Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Sobolev Institute of MathematicsNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations