Abstract
The k-medoids problem is a discrete sum-of-square clustering problem, which is known to be more robust to outliers than k-means clustering. As an optimization problem, k-medoids is NP-hard. This paper examines k-medoids clustering in the case of a two-dimensional Pareto front, as generated by bi-objective optimization approaches. A characterization of optimal clusters is provided in this case. This allows to solve k-medoids to optimality in polynomial time using a dynamic programming algorithm. More precisely, having N points to cluster, the complexity of the algorithm is proven in \(O(N^3)\) time and \(O(N^2)\) memory space. This algorithm can also be used to minimize conjointly the number of clusters and the dissimilarity of clusters. This bi-objective extension is also solvable to optimality in \(O(N^3)\) time and \(O(N^2)\) memory space, which is useful to choose the appropriate number of clusters for the real-life applications. Parallelization issues are also discussed, to speed-up the algorithm in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Auger, A., Bader, J., Brockhoff, D., Zitzler, E.: Investigating and exploiting the bias of the weighted hypervolume to articulate user preferences. In: Proceedings of GECCO 2009, pp. 563–570. ACM (2009)
Bringmann, K., Friedrich, T., Klitzke, P.: Two-dimensional subset selection for hypervolume and epsilon-indicator. In: Annual Conference on Genetic and Evolutionary Computation, pp. 589–596. ACM (2014)
Dupin, N.: Modélisation et résolution de grands problèmes stochastiques combinatoires: application à la gestion de production d’électricité. Ph.D. thesis, University Lille 1 (2015)
Dupin, N., Nielsen, F., Talbi, E.: Clustering in a 2d pareto front: p-median and p-center are solvable in polynomial time, pp. 1–24 (2018). arXiv:1806.02098
Dupin, N., Nielsen, F., Talbi, E.: Dynamic programming heuristic for k-means clustering among a 2-dimensional pareto frontier. In: 7th International Conference on Metaheuristics and Nature Inspired Computing, pp. 1–8 (2018)
Ehrgott, M., Gandibleux, X.: Multiobjective combinatorial optimization-theory, methodology, and applications. In: Multiple Criteria Optimization: State of the Art Annotated Bibliographic Surveys, pp. 369–444. Springer (2003)
Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S., Schneider, S., Song, M.: Fast exact k-means, k-medians and bregman divergence clustering in 1d (2017). arXiv preprint arXiv:1701.07204
Hsu, W., Nemhauser, G.: Easy and hard bottleneck location problems. Discret. Appl. Math. 1(3), 209–215 (1979)
Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids (1987)
Kuhn, T., Fonseca, C.M., Paquete, L., Ruzika, S., Duarte, M.M., Figueira, J.R.: Hypervolume subset selection in two dimensions: formulations and algorithms. Evol. Comput. 24(3), 411–425 (2016)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Nielsen, F.: Output-sensitive peeling of convex and maximal layers. Inf. Process. Lett. 59(5), 255–259 (1996)
Nielsen, F.: Introduction to HPC with MPI for Data Science. Springer (2016)
Peugeot, T., Dupin, N., Sembely, M.J., Dubecq, C.: MBSE, PLM, MIP and robust optimization for system of systems management, application to SCCOA French air defense program. In: Complex Systems Design & Management, pp. 29–40. Springer (2017)
Rasson, J.P., Kubushishi, T.: The gap test: an optimal method for determining the number of natural classes in cluster analysis. In: New Approaches in Classification and Data Analysis, pp. 186–193. Springer (1994)
Saule, E., Baş, E., Çatalyürek, Ü.: Load-balancing spatially located computations using rectangular partitions. J. Parallel Distrib. Comput. 72(10), 1201–1214 (2012)
Schubert, E., Rousseeuw, P.: Faster k-Medoids clustering: improving the PAM, CLARA, and CLARANS algorithms (2018). arXiv preprint arXiv:1810.05691
Sheng, W., Liu, X.: A genetic k-medoids clustering algorithm. J. Heuristics 12(6), 447–466 (2006)
Talbi, E.: Metaheuristics: From Design to Implementation. Wiley (2009)
Wang, H., Song, M.: Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming. The R J. 3(2), 29 (2011)
Zio, E., Bazzo, R.: A clustering procedure for reducing the number of representative solutions in the Pareto Front of multiobjective optimization problems. Eur. J. Oper. Res. 210(3), 624–634 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dupin, N., Nielsen, F., Talbi, EG. (2020). K-Medoids Clustering Is Solvable in Polynomial Time for a 2d Pareto Front. In: Le Thi, H., Le, H., Pham Dinh, T. (eds) Optimization of Complex Systems: Theory, Models, Algorithms and Applications. WCGO 2019. Advances in Intelligent Systems and Computing, vol 991. Springer, Cham. https://doi.org/10.1007/978-3-030-21803-4_79
Download citation
DOI: https://doi.org/10.1007/978-3-030-21803-4_79
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21802-7
Online ISBN: 978-3-030-21803-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)