Abstract
In this paper, we study the computational complexity of the following subset search problem in a set of vectors. Given a set of N Euclidean q-dimensional vectors and an integer M, choose a subset of at least M vectors minimizing the Euclidean norm of the arithmetic mean of chosen vectors. This problem is induced, in particular, by a problem of clustering a set of points into two clusters where one of the clusters consists of points with a mean close to a given point. Without loss of generality the given point may be assumed to be the origin.
We show that the considered problem is NP-hard in the strong sense and it does not admit any approximation algorithm with guaranteed performance, unless P = NP. An exact algorithm with pseudo-polynomial time complexity is proposed for the special case of the problem, where the dimension q of the space is bounded from above by a constant and the input data are integer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Data Mining: The Textbook. Springer International Publishing, Switzerland (2015)
Bishop, M.C.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)
Baburin, A.E., Gimadi, E.K., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)
Borisovsky, P.A., Eremeev, A.V., Grinkevich, E.B., Klokov, S.A., Vinnikov, A.V.: Trading hubs construction for electricity markets. In: Kallrath, J., Pardalos, P.M., Rebennack, S., Scheidt, M. (eds.) Optimization in the Energy Industry. Energy Systems, pp. 29–58. Springer, Heidelberg (2009)
Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)
Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Trudy Instituta Matematiki i Mekhaniki UrO RAN 21(3), 100–109 (2015). (in Russian)
Garey, M.R., Johnson, D.S.: Computers and Intractability. A Guide to the Theory of \(NP\)-Completeness. W.H. Freeman and Company, San Francisco (1979)
Gimadi, E.K., Glazkov, Y.V., Rykov, I.A.: On two problems of choosing some subset of vectors with integer coordinates that has maximum norm of the sum of elements in euclidean space. J. Appl. Ind. Math. 3(3), 343–352 (2009)
Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: Aposteriori finding a quasiperiodic fragment with given number of repetitions in a number sequence (in Russian). Sibirskii Zhurnal Industrial’noi Matematiki 9(25), 55–74 (2006)
Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recogn. Image Anal. 18(1), 30–42 (2008)
Gimadi, E.K., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(4), 48–53 (2010)
Gimadi, E.K., Rykov, I.A.: A randomized algorithm for finding a subset of vectors. J. Appl. Ind. Math. 9(3), 351–357 (2015)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
Kel’manov, A.V.: Off-line detection of a quasi-periodically recurring fragment in a numerical sequence. Proc. Steklov Inst. Math. 263(S2), 84–92 (2008)
Kel’manov, A.V.: On the complexity of some data analysis problems. Comput. Math. Math. Phys. 50(11), 1941–1947 (2010)
Kel’manov, A.V.: On the complexity of some cluster analysis problems. Comput. Math. Math. Phys. 51(11), 1983–1988 (2011)
Kel’manov, A.V., Khamidullin, S.A., Kel’manova, M.A.: Joint finding and evaluation of a repeating fragment in noised number sequence with given number of quasiperiodic repetitions (in Russian). In: Book of Abstracts of the Russian Conference “Discret Analysis and Operations Research” (DAOR-2004), p. 185. Sobolev Institute of Mathematics SB RAN, Novosibirsk (2004)
Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)
Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)
Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)
Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem. Comput. Math. Math. Phys. 56(2), 334–341 (2016)
Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Doklady Math. 78(1), 574–575 (2008)
Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)
Kel’manov, A.V., Pyatkin, A.V.: Complexity of certain problems of searching for subsets of vectors and cluster analysis. Comput. Math. Math. Phys. 49(11), 1966–1971 (2009)
Tarasenko, E.: On complexity of single-hub selection problem. In: Proceedings of 24-th Regional Conference of Students “Molodezh tretjego tysacheletija”, pp. 45–48. Omsk State University, Omsk (2010). (in Russian)
Acknowledgements
This research is supported by RFBR, projects 15-01-00462, 16-01-00740 and 15-01-00976.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Eremeev, A.V., Kel’manov, A.V., Pyatkin, A.V. (2017). On Complexity of Searching a Subset of Vectors with Shortest Average Under a Cardinality Restriction. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-52920-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)