Abstract
Most commonly used clustering algorithms are those aimed at solving the well-known k-median problem. Their main advantage is that they are simple to implement and use, and they are flexible in choosing dissimilarity measures (not necessarily metrics). K-median algorithms are also known to be more robust to noise and outliers in comparison with k-means algorithms. In spite of that, they have been of limited use for large-scale clustering problems due to their high computational and space complexity. This work aims at computational comparison of k-median clustering algorithms in a specific large-scale setting—clustering large image collections. We implement distributed versions of the most common k-median clustering algorithms and compare them with our parallel heuristic for solving large-scale k-median problem instances. We analyze clustering results with respect to external evaluation measures and run time.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Irkutsk supercomputer center of SB RAS. http://hpc.icc.ru. Accessed 15 Feb 2019
An, H.-C., Svensson, O.: Recent developments in approximation algorithms for facility location and clustering problems. In: Fukunaga, T., Kawarabayashi, K. (eds.) Combinatorial Optimization and Graph Algorithms, pp. 1–19. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-6147-9_1
Arbelaez, A., Quesada, L.: Parallelising the k-medoids clustering problem using space-partitioning. In: Helmert, M., Röger, G. (eds.) Proceedings the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pp. 20–28. AAAI (2013)
Avella, P., Boccia, M., Salerno, S., Vasilyev, I.: An aggregation heuristic for large scale p-median problem. Comput. Oper. Res. 39(7), 1625–1632 (2012)
Avella, P., Boccia, M., Sforza, A., Vasilyev, I.: An effective heuristic for large-scale capacitated facility location problems. J. Heuristics 15(6), 597–615 (2008)
Avella, P., Sassano, A., Vasilyev, I.: Computational study of large-scale p-median problems. Math. Program. 109(1), 89–114 (2007)
Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms 13(2), 23:1–23:31 (2017). https://doi.org/10.1145/2981561
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pp. 67–74. IEEE (2018). https://doi.org/10.1109/FG.2018.00020
Carrizosa, E., Ushakov, A., Vasilyev, I.: A computational study of a nonlinear minsum facility location problem. Comput. Oper. Res. 39(11), 2625–2633 (2012)
Crainic, T.G., Gendreau, M., Hansen, P., Mladenović, N.: Cooperative parallel variable neighborhood search for the p-median. J. Heuristics 10(3), 293–314 (2004)
Daskin, M.S., Maass, K.L.: The p-median problem. In: Laporte, G., Nickel, S., da Gama, F.S. (eds.) Location Science, pp. 21–45. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13111-5_2
Fisher, M.L.: The lagrangian relaxation method for solving integer programming problems. Manage. Sci. 27(1), 1–18 (1981)
Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27
García, S., Labbé, M., Marín, A.: Solving large p-median problems with a radius formulation. INFORMS J. Comput. 23(4), 546–556 (2011)
Garcia-López, F., Melián-Batista, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: The parallel variable neighborhood search for the p-median problem. J. Heuristics 8(3), 375–388 (2002)
Garcia-López, F., Melián-Batista, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: Parallelization of the scatter search for the p-median problem. Parallel Comput. 29(5), 575–589 (2003). Parallel computing in logistics
Hanafi, S., Sterle, C., Ushakov, A., Vasilyev, I.: A parallel subgradient algorithm for lagrangean dual function of the \(p\)-median problem. Studia Informatica Universalis 9(3), 105–124 (2011)
Hansen, P., Brimberg, J., Urosević, D., Mladenović, N.: Solving large p-median clustering problems by primal-dual variable neighborhood search. Data Min. Knowl. Discov. 19(3), 351–375 (2009)
Kariv, O., Hakimi, S.: An algorithmic approach to network location problems. II: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the \(L_1\)-Norm and Related Methods, pp. 405–416. North-Holland (1987)
Li, S., Svensson, O.: Approximating k-median via pseudo-approximation. SIAM J. Comput. 45(2), 530–547 (2016). https://doi.org/10.1137/130938645
Mancini, E.P., Marcarelli, S., Vasilyev, I., Villano, U.: A grid-aware MIP solver: implementation and case studies. Futur. Gener. Comp. Syst. 24(2), 133–141 (2008)
Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM J. Comput. 13(1), 182–196 (1984)
Mladenović, N., Brimberg, J., Hansen, P., Moreno-Pérez, J.: The p-median problem: a survey of metaheuristic approaches. Eur. J. Oper. Res. 179(3), 927–939 (2007)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Xie, X., Jones, M.W., Tam, G.K.L. (eds.) Proceedings the British Machine Vision Conference (BMVC), pp. 41.1–41.12. BMVA Press (2015). https://doi.org/10.5244/C.29.41
Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: Proceedings 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1087–1096. ACM, New York (2017). https://doi.org/10.1145/3097983.3098098
Vasilyev, I., Ushakov, A.: A shared memory parallel heuristic algorithm for the large-scale p-median problem. In: Sforza, A., Sterle, C. (eds.) Optimization and Decision Science: Methodologies and Applications, ODS 2017. Mathematics & Statistics, vol. 217, pp. 295–302. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67308-0_30
Vasilyev, I., Ushakov, A.V., Maltugueva, N., Sforza, A.: An effective heuristic for large-scale fault-tolerant k-median problem. Soft Comput. (2018). https://doi.org/10.1007/s00500-018-3562-6
Whitaker, R.A.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Can. J. Oper. Res. Inf. Process. 21, 95–108 (1983)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Acknowledgement
This work is supported by the Russian Science Foundation under grant 17-71-10176.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ushakov, A.V., Vasilyev, I. (2019). A Computational Comparison of Parallel and Distributed K-median Clustering Algorithms on Large-Scale Image Data. In: Bykadorov, I., Strusevich, V., Tchemisova, T. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Communications in Computer and Information Science, vol 1090. Springer, Cham. https://doi.org/10.1007/978-3-030-33394-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-33394-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33393-5
Online ISBN: 978-3-030-33394-2
eBook Packages: Computer ScienceComputer Science (R0)