Advertisement

K-Medoids Clustering Is Solvable in Polynomial Time for a 2d Pareto Front

  • Nicolas DupinEmail author
  • Frank Nielsen
  • El-Ghazali Talbi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 991)

Abstract

The k-medoids problem is a discrete sum-of-square clustering problem, which is known to be more robust to outliers than k-means clustering. As an optimization problem, k-medoids is NP-hard. This paper examines k-medoids clustering in the case of a two-dimensional Pareto front, as generated by bi-objective optimization approaches. A characterization of optimal clusters is provided in this case. This allows to solve k-medoids to optimality in polynomial time using a dynamic programming algorithm. More precisely, having N points to cluster, the complexity of the algorithm is proven in \(O(N^3)\) time and \(O(N^2)\) memory space. This algorithm can also be used to minimize conjointly the number of clusters and the dissimilarity of clusters. This bi-objective extension is also solvable to optimality in \(O(N^3)\) time and \(O(N^2)\) memory space, which is useful to choose the appropriate number of clusters for the real-life applications. Parallelization issues are also discussed, to speed-up the algorithm in practice.

Keywords

Bi-objective optimization Clustering algorithms K-medoids Euclidean sum-of-squares clustering Pareto front Dynamic programming Bi-objective clustering 

References

  1. 1.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)CrossRefGoogle Scholar
  2. 2.
    Auger, A., Bader, J., Brockhoff, D., Zitzler, E.: Investigating and exploiting the bias of the weighted hypervolume to articulate user preferences. In: Proceedings of GECCO 2009, pp. 563–570. ACM (2009)Google Scholar
  3. 3.
    Bringmann, K., Friedrich, T., Klitzke, P.: Two-dimensional subset selection for hypervolume and epsilon-indicator. In: Annual Conference on Genetic and Evolutionary Computation, pp. 589–596. ACM (2014)Google Scholar
  4. 4.
    Dupin, N.: Modélisation et résolution de grands problèmes stochastiques combinatoires: application à la gestion de production d’électricité. Ph.D. thesis, University Lille 1 (2015)Google Scholar
  5. 5.
    Dupin, N., Nielsen, F., Talbi, E.: Clustering in a 2d pareto front: p-median and p-center are solvable in polynomial time, pp. 1–24 (2018). arXiv:1806.02098
  6. 6.
    Dupin, N., Nielsen, F., Talbi, E.: Dynamic programming heuristic for k-means clustering among a 2-dimensional pareto frontier. In: 7th International Conference on Metaheuristics and Nature Inspired Computing, pp. 1–8 (2018)Google Scholar
  7. 7.
    Ehrgott, M., Gandibleux, X.: Multiobjective combinatorial optimization-theory, methodology, and applications. In: Multiple Criteria Optimization: State of the Art Annotated Bibliographic Surveys, pp. 369–444. Springer (2003)Google Scholar
  8. 8.
    Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S., Schneider, S., Song, M.: Fast exact k-means, k-medians and bregman divergence clustering in 1d (2017). arXiv preprint arXiv:1701.07204
  9. 9.
    Hsu, W., Nemhauser, G.: Easy and hard bottleneck location problems. Discret. Appl. Math. 1(3), 209–215 (1979)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  11. 11.
    Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids (1987)Google Scholar
  12. 12.
    Kuhn, T., Fonseca, C.M., Paquete, L., Ruzika, S., Duarte, M.M., Figueira, J.R.: Hypervolume subset selection in two dimensions: formulations and algorithms. Evol. Comput. 24(3), 411–425 (2016)CrossRefGoogle Scholar
  13. 13.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Nielsen, F.: Output-sensitive peeling of convex and maximal layers. Inf. Process. Lett. 59(5), 255–259 (1996)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Nielsen, F.: Introduction to HPC with MPI for Data Science. Springer (2016)Google Scholar
  16. 16.
    Peugeot, T., Dupin, N., Sembely, M.J., Dubecq, C.: MBSE, PLM, MIP and robust optimization for system of systems management, application to SCCOA French air defense program. In: Complex Systems Design & Management, pp. 29–40. Springer (2017)Google Scholar
  17. 17.
    Rasson, J.P., Kubushishi, T.: The gap test: an optimal method for determining the number of natural classes in cluster analysis. In: New Approaches in Classification and Data Analysis, pp. 186–193. Springer (1994)Google Scholar
  18. 18.
    Saule, E., Baş, E., Çatalyürek, Ü.: Load-balancing spatially located computations using rectangular partitions. J. Parallel Distrib. Comput. 72(10), 1201–1214 (2012)CrossRefGoogle Scholar
  19. 19.
    Schubert, E., Rousseeuw, P.: Faster k-Medoids clustering: improving the PAM, CLARA, and CLARANS algorithms (2018). arXiv preprint arXiv:1810.05691
  20. 20.
    Sheng, W., Liu, X.: A genetic k-medoids clustering algorithm. J. Heuristics 12(6), 447–466 (2006)CrossRefGoogle Scholar
  21. 21.
    Talbi, E.: Metaheuristics: From Design to Implementation. Wiley (2009)Google Scholar
  22. 22.
    Wang, H., Song, M.: Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming. The R J. 3(2), 29 (2011)CrossRefGoogle Scholar
  23. 23.
    Zio, E., Bazzo, R.: A clustering procedure for reducing the number of representative solutions in the Pareto Front of multiobjective optimization problems. Eur. J. Oper. Res. 210(3), 624–634 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Nicolas Dupin
    • 1
    Email author
  • Frank Nielsen
    • 2
  • El-Ghazali Talbi
    • 3
  1. 1.LRI, Université Paris-Sud, Université Paris-SaclayParisFrance
  2. 2.Sony Computer Science Laboratories Inc.TokyoJapan
  3. 3.Univ. Lille, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de LilleLilleFrance

Personalised recommendations