Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset
We consider two problems of searching for the largest subset in a finite set of points in Euclidean space. Both problems have applications, in particular, in data analysis and data approximation. In each problem, an input set and a positive real number are given and it is required to find a subset (i.e., a cluster) of the largest size under a constraint on the value of a quadratic function. The points in the input set which are outside the sought subset are treated as a second cluster. In the first problem, the function in the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., sought) cluster is unknown and determined as the centroid, while the center of the second one is fixed in a given point in Euclidean space (without loss of generality in the origin). In the second problem, the function in the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as the centroid, while the center of the second one is fixed in the origin. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the cases of these problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.
KeywordsEuclidean space 2-clustering Largest subset NP-hardness Exact algorithm Pseudopolynomial-time solvability
The study of Problem 1 was supported by the Russian Science Foundation, project 16-11-10041. The study of Problem 2 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of Basic Research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
- 1.MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, vol. 1, pp. 281–297 (1967)Google Scholar
- 14.Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximation Algorithms for NP-Hard Problems, pp. 296–345. PWS Publishing, Boston (1997)Google Scholar
- 15.Indyk, P.: A sublinear time approximation scheme for clustering in metric space. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 154–159 (1999)Google Scholar
- 17.de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity (ECCC). Report No. 25 (2002)Google Scholar
- 18.Hasegawa, S., Imai, H., Inaba, M., Katoh, N., Nakano, J.: Efficient algorithms for variance-based \(k\)-clustering. In: Proceedings of the 1st Pacific Conference on Computer Graphics and Applications, Pacific Graphics 1993, Seoul, Korea, vol. 1. pp. 75–89. World Scientific, River Edge (1993)Google Scholar
- 19.Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based \(k\)-clustering: (extended abstract), 6–8 June 1994, Stony Brook, NY, USA, pp. 332–339. ACM, New York (1994)Google Scholar
- 29.Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge (2017)Google Scholar
- 33.Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Siberian J. Ind. Math. 9(1(25)), 55–74 (2006). (in Russian)Google Scholar
- 45.Kel’manov, A., Motkova, A.: A fully polynomial-time approximation scheme for a special case of a balanced 2-clustering problem. In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 182–192. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44914-2_15CrossRefGoogle Scholar