An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem
A problem of partitioning a finite sequence of points in Euclidean space into two subsequences (clusters) maximizing the size of the first cluster subject to two constraints is considered. The first constraint deals with every two consecutive indices of elements of the first cluster: the difference between them is bounded from above and below by some constants. The second one restricts the value of a quadratic clustering function that is the sum of the intracluster sums over both clusters. The intracluster sum is the sum of squared distances between cluster elements and the cluster center. The center of the first cluster is unknown and determined as the centroid (i.e. as the mean value of its elements), while the center of the second one is zero.
The strong NP-hardness of the problem is shown and an exact algorithm is suggested for the case of integer coordinates of input points. If the space dimension is bounded by some constant this algorithm runs in a pseudopolynomial time.
KeywordsEuclidean space Sequence 2-partition Longest subsequence Quadratic variance NP-hard problem Integer coordinates Exact algorithm Fixed space dimension Pseudopolynomial running time
The study presented in Sects. 3 and 5 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 2 and 4 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
- 1.MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
- 13.Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A randomized algorithm for a sequence 2-clustering problem. Comput. Math. Math. Phys. 58(12) (2018, in publishing)Google Scholar
- 20.Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09156-3_49CrossRefGoogle Scholar
- 29.Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Sib. J. Ind. Math. 9 (1(25)), 55–74 (2006). (in Russian)Google Scholar
- 40.Kel’manov, A.V., Khandeev, V.I., Panasenko A.V.: Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 294–304. Springer, Cham (2018)Google Scholar
- 41.Kel’manov, A.V., Khandeev, V.I.: Panasenko A.V.: Exact algorithms for two hard to solve 2-clustering problems. Pattern Recognit. Image Anal. 27(4) (2018, in publishing)Google Scholar