Skip to main content

On Complexity of Searching a Subset of Vectors with Shortest Average Under a Cardinality Restriction

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Abstract

In this paper, we study the computational complexity of the following subset search problem in a set of vectors. Given a set of N Euclidean q-dimensional vectors and an integer M, choose a subset of at least M vectors minimizing the Euclidean norm of the arithmetic mean of chosen vectors. This problem is induced, in particular, by a problem of clustering a set of points into two clusters where one of the clusters consists of points with a mean close to a given point. Without loss of generality the given point may be assumed to be the origin.

We show that the considered problem is NP-hard in the strong sense and it does not admit any approximation algorithm with guaranteed performance, unless P = NP. An exact algorithm with pseudo-polynomial time complexity is proposed for the special case of the problem, where the dimension q of the space is bounded from above by a constant and the input data are integer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C.: Data Mining: The Textbook. Springer International Publishing, Switzerland (2015)

    Book  MATH  Google Scholar 

  2. Bishop, M.C.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)

    MATH  Google Scholar 

  3. Baburin, A.E., Gimadi, E.K., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Borisovsky, P.A., Eremeev, A.V., Grinkevich, E.B., Klokov, S.A., Vinnikov, A.V.: Trading hubs construction for electricity markets. In: Kallrath, J., Pardalos, P.M., Rebennack, S., Scheidt, M. (eds.) Optimization in the Energy Industry. Energy Systems, pp. 29–58. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Trudy Instituta Matematiki i Mekhaniki UrO RAN 21(3), 100–109 (2015). (in Russian)

    Google Scholar 

  7. Garey, M.R., Johnson, D.S.: Computers and Intractability. A Guide to the Theory of \(NP\)-Completeness. W.H. Freeman and Company, San Francisco (1979)

    MATH  Google Scholar 

  8. Gimadi, E.K., Glazkov, Y.V., Rykov, I.A.: On two problems of choosing some subset of vectors with integer coordinates that has maximum norm of the sum of elements in euclidean space. J. Appl. Ind. Math. 3(3), 343–352 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: Aposteriori finding a quasiperiodic fragment with given number of repetitions in a number sequence (in Russian). Sibirskii Zhurnal Industrial’noi Matematiki 9(25), 55–74 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recogn. Image Anal. 18(1), 30–42 (2008)

    Article  MATH  Google Scholar 

  11. Gimadi, E.K., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(4), 48–53 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gimadi, E.K., Rykov, I.A.: A randomized algorithm for finding a subset of vectors. J. Appl. Ind. Math. 9(3), 351–357 (2015)

    Article  MATH  Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)

    Book  MATH  Google Scholar 

  14. Kel’manov, A.V.: Off-line detection of a quasi-periodically recurring fragment in a numerical sequence. Proc. Steklov Inst. Math. 263(S2), 84–92 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kel’manov, A.V.: On the complexity of some data analysis problems. Comput. Math. Math. Phys. 50(11), 1941–1947 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Kel’manov, A.V.: On the complexity of some cluster analysis problems. Comput. Math. Math. Phys. 51(11), 1983–1988 (2011)

    Article  MathSciNet  Google Scholar 

  17. Kel’manov, A.V., Khamidullin, S.A., Kel’manova, M.A.: Joint finding and evaluation of a repeating fragment in noised number sequence with given number of quasiperiodic repetitions (in Russian). In: Book of Abstracts of the Russian Conference “Discret Analysis and Operations Research” (DAOR-2004), p. 185. Sobolev Institute of Mathematics SB RAN, Novosibirsk (2004)

    Google Scholar 

  18. Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)

    Article  MATH  Google Scholar 

  21. Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem. Comput. Math. Math. Phys. 56(2), 334–341 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Doklady Math. 78(1), 574–575 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)

    Article  MathSciNet  Google Scholar 

  24. Kel’manov, A.V., Pyatkin, A.V.: Complexity of certain problems of searching for subsets of vectors and cluster analysis. Comput. Math. Math. Phys. 49(11), 1966–1971 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  25. Tarasenko, E.: On complexity of single-hub selection problem. In: Proceedings of 24-th Regional Conference of Students “Molodezh tretjego tysacheletija”, pp. 45–48. Omsk State University, Omsk (2010). (in Russian)

    Google Scholar 

Download references

Acknowledgements

This research is supported by RFBR, projects 15-01-00462, 16-01-00740 and 15-01-00976.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton V. Eremeev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Eremeev, A.V., Kel’manov, A.V., Pyatkin, A.V. (2017). On Complexity of Searching a Subset of Vectors with Shortest Average Under a Cardinality Restriction. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics