Abstract
A problem of partitioning a finite sequence of points in Euclidean space into two subsequences (clusters) maximizing the size of the first cluster subject to two constraints is considered. The first constraint deals with every two consecutive indices of elements of the first cluster: the difference between them is bounded from above and below by some constants. The second one restricts the value of a quadratic clustering function that is the sum of the intracluster sums over both clusters. The intracluster sum is the sum of squared distances between cluster elements and the cluster center. The center of the first cluster is unknown and determined as the centroid (i.e. as the mean value of its elements), while the center of the second one is zero.
The strong NP-hardness of the problem is shown and an exact algorithm is suggested for the case of integer coordinates of input points. If the space dimension is bounded by some constant this algorithm runs in a pseudopolynomial time.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)
Hansen, P., Jaumard, B., Mladenovich, N.: Minimum sum of squares clustering in a low dimensional space. J. Classifi. 15, 37–55 (1998)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)
Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)
Dolgushev, A.V., Kel’manov, A.V.: On the algorithmic complexity of a problem in cluster analysis. J. Appl. Ind. Math. 5(2), 191–194 (2011)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)
Kel’manov, A.V., Khamidullin, S.A.: An approximating polynomial algorithm for a sequence partitioning problem. J. Appl. Ind. Math. 8(2), 236–244 (2014)
Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: Exact pseudopolynomial algorithm for one sequence partitioning problem. Autom. Remote Control. 78(1), 66–73 (2017)
Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A fully polynomial-time approximation scheme for a sequence 2-cluster partitioning problem. J. Appl. Ind. Math. 10(2), 209–219 (2016)
Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A randomized algorithm for a sequence 2-clustering problem. Comput. Math. Math. Phys. 58(12) (2018, in publishing)
Kel’manov, A., Khamidullin, S., Khandeev, V.: A randomized algorithm for 2-partition of a sequence. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 313–322. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_29
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). The MIT Press, Cambridge (2017)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09156-3_49
Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Pach, J., Agarwal, P.K.: Combinatorial Geometry. Wiley, New York (1995)
Fu, T.-C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. Remote Sensing and Digital Image Processing, vol. 22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15967-6
Liao, T.W.: Clustering of time series data – a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Dokl. Math. 78(1), 574–575 (2008)
Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)
Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)
Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Sib. J. Ind. Math. 9 (1(25)), 55–74 (2006). (in Russian)
Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recognit. Image Anal. 18(1), 30–42 (2008)
Baburin, A.E., Gimadi, E.Kh., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)
Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)
Gimadi, E.Kh., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(1), 48–53 (2010)
Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)
Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)
Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Proc. Steklov Inst. Math. 295(Suppl. 1), 47–56 (2016)
Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem. J. Appl. Ind. Math. 56(2), 334–341 (2016)
Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_30
Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)
Kel’manov, A.V., Khandeev, V.I., Panasenko A.V.: Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 294–304. Springer, Cham (2018)
Kel’manov, A.V., Khandeev, V.I.: Panasenko A.V.: Exact algorithms for two hard to solve 2-clustering problems. Pattern Recognit. Image Anal. 27(4) (2018, in publishing)
Kel’manov, A.V., Pyatkin, A.V.: On complexity of some problems of cluster analysis of vector sequences. J. Appl. Ind. Math. 7(3), 363–369 (2013)
Kel’manov, A.V., Khamidullin, S.A.: An approximation polynomial-time algorithm for a sequence Bi-clustering problem. Comput. Math. Math. Phys. 55(6), 1068–1076 (2015)
Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)
Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)
Acknowledgments
The study presented in Sects. 3 and 5 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 2 and 4 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kel’manov, A., Khamidullin, S., Khandeev, V., Pyatkin, A. (2019). An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem. In: Evtushenko, Y., Jaćimović, M., Khachay, M., Kochetov, Y., Malkova, V., Posypkin, M. (eds) Optimization and Applications. OPTIMA 2018. Communications in Computer and Information Science, vol 974. Springer, Cham. https://doi.org/10.1007/978-3-030-10934-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-10934-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10933-2
Online ISBN: 978-3-030-10934-9
eBook Packages: Computer ScienceComputer Science (R0)