Skip to main content

Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11179))

Abstract

We consider two problems of searching for the largest subset in a finite set of points in Euclidean space. Both problems have applications, in particular, in data analysis and data approximation. In each problem, an input set and a positive real number are given and it is required to find a subset (i.e., a cluster) of the largest size under a constraint on the value of a quadratic function. The points in the input set which are outside the sought subset are treated as a second cluster. In the first problem, the function in the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., sought) cluster is unknown and determined as the centroid, while the center of the second one is fixed in a given point in Euclidean space (without loss of generality in the origin). In the second problem, the function in the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as the centroid, while the center of the second one is fixed in the origin. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the cases of these problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  2. Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)

    Article  Google Scholar 

  3. Hansen, P., Jaumard, B., Mladenovich, N.: Minimum sum of squares clustering in a low dimensional space. J. Classif. 15, 37–55 (1998)

    Article  MathSciNet  Google Scholar 

  4. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)

    MATH  Google Scholar 

  6. Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  7. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)

    Article  Google Scholar 

  8. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)

    Article  Google Scholar 

  9. Dolgushev, A.V., Kel’manov, A.V.: On the algorithmic complexity of a problem in cluster analysis. J. Appl. Ind. Math. 5(2), 191–194 (2011)

    Article  MathSciNet  Google Scholar 

  10. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theoret. Comput. Sci. 442, 13–21 (2012)

    Article  MathSciNet  Google Scholar 

  11. Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Doklady Math. 78(1), 574–575 (2008)

    Article  MathSciNet  Google Scholar 

  12. Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)

    Article  MathSciNet  Google Scholar 

  13. Brucker, P.: On the complexity of clustering problems. In: Henn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research. LNEMS, vol. 157, pp. 45–54. Springer, Heidelberg (1978). https://doi.org/10.1007/978-3-642-95322-4_5

    Chapter  Google Scholar 

  14. Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximation Algorithms for NP-Hard Problems, pp. 296–345. PWS Publishing, Boston (1997)

    Google Scholar 

  15. Indyk, P.: A sublinear time approximation scheme for clustering in metric space. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 154–159 (1999)

    Google Scholar 

  16. de la Vega, F., Kenyon, C.: A randomized approximation scheme for metric max-cut. J. Comput. Syst. Sci. 63, 531–541 (2001)

    Article  MathSciNet  Google Scholar 

  17. de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity (ECCC). Report No. 25 (2002)

    Google Scholar 

  18. Hasegawa, S., Imai, H., Inaba, M., Katoh, N., Nakano, J.: Efficient algorithms for variance-based \(k\)-clustering. In: Proceedings of the 1st Pacific Conference on Computer Graphics and Applications, Pacific Graphics 1993, Seoul, Korea, vol. 1. pp. 75–89. World Scientific, River Edge (1993)

    Google Scholar 

  19. Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based \(k\)-clustering: (extended abstract), 6–8 June 1994, Stony Brook, NY, USA, pp. 332–339. ACM, New York (1994)

    Google Scholar 

  20. Sahni, S., Gonzalez, T.: P-complete approximation problems. J. ACM 23, 555–566 (1976)

    Article  MathSciNet  Google Scholar 

  21. Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V.: NP-hardness of the Euclidean maxcut problem. Doklady Math. 89(3), 343–345 (2014)

    Article  MathSciNet  Google Scholar 

  22. Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V.: Complexity of the weighted max-cut in Euclidean space. J. Appl. Ind. Math. 8(4), 453–457 (2014)

    Article  MathSciNet  Google Scholar 

  23. Kel’manov, A.V., Pyatkin, A.V.: NP-hardness of some quadratic Euclidean 2-clustering problems. Doklady Math. 92(2), 634–637 (2015)

    Article  MathSciNet  Google Scholar 

  24. Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some quadratic Euclidean 2-clustering problems. Comput. Math. Math. Phys. 56(3), 491–497 (2016)

    Article  MathSciNet  Google Scholar 

  25. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)

    MATH  Google Scholar 

  26. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer Science+Business Media, LLC, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  27. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  28. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-14142-8

    Book  MATH  Google Scholar 

  29. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge (2017)

    Google Scholar 

  30. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., et al. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09156-3_49

    Chapter  Google Scholar 

  31. Pach, J., Agarwal, P.K.: Combinatorial Geometry. Wiley, New York (1995)

    Book  Google Scholar 

  32. Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)

    Article  MathSciNet  Google Scholar 

  33. Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Siberian J. Ind. Math. 9(1(25)), 55–74 (2006). (in Russian)

    Google Scholar 

  34. Gimadi, E.K., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recogn. Image Anal. 18(1), 30–42 (2008)

    Article  Google Scholar 

  35. Baburin, A.E., Gimadi, E.K., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)

    Article  MathSciNet  Google Scholar 

  36. Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)

    Article  MathSciNet  Google Scholar 

  37. Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Proc. Steklov Inst. Math. 295(Suppl. 1), 47–56 (2016)

    Article  MathSciNet  Google Scholar 

  38. Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)

    Article  MathSciNet  Google Scholar 

  39. Gimadi, E.K., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(1), 48–53 (2010)

    Article  MathSciNet  Google Scholar 

  40. Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)

    Article  MathSciNet  Google Scholar 

  41. Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)

    Article  MathSciNet  Google Scholar 

  42. Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic euclidean 2-clustering problem. J. Appl. Ind. Math. 56(2), 334–341 (2016)

    MathSciNet  MATH  Google Scholar 

  43. Kel’manov, A.V., Motkova, A.V.: Polynomial-time approximation algorithm for the problem of cardinality-weighted variance-based 2-clustering with a given center. Comput. Math. Math. Phys. 58(1), 130–136 (2018)

    Article  MathSciNet  Google Scholar 

  44. Kel’manov, A.V., Motkova, A.V.: Exact pseudopolynomial algorithms for a balanced 2-clustering problem. J. Appl. Ind. Math. 10(3), 349–355 (2016)

    Article  MathSciNet  Google Scholar 

  45. Kel’manov, A., Motkova, A.: A fully polynomial-time approximation scheme for a special case of a balanced 2-clustering problem. In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 182–192. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44914-2_15

    Chapter  Google Scholar 

  46. Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_30

    Chapter  Google Scholar 

  47. Kel’manov, A., Khandeev, V., Panasenko, A.: Randomized algorithms for some clustering problems. In: Eremeev, A., Khachay, M., Kochetov, Y., Pardalos, P. (eds.) OPTA 2018. CCIS, vol. 871, pp. 109–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93800-4_9

    Chapter  Google Scholar 

Download references

Acknowledgments

The study of Problem 1 was supported by the Russian Science Foundation, project 16-11-10041. The study of Problem 2 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of Basic Research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Khandeev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kel’manov, A., Khandeev, V., Panasenko, A. (2018). Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11027-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11026-0

  • Online ISBN: 978-3-030-11027-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics