Abstract
This paper presents a review of the well-known K-means, H-means, and J-means heuristics, and their variants, that are used to solve the minimum sum-of-squares clustering problem. We then develop two new local searches that combine these heuristics in a nested and sequential structure, also referred to as variable neighborhood descent. In order to show how these local searches can be implemented within a metaheuristic framework, we apply the new heuristics in the local improvement step of two variable neighborhood search (VNS) procedures. Computational experiments are carried out which suggest that this new and simple application of VNS is comparable to the state of the art. In addition, a very significant improvement (over 30%) in solution quality is obtained for the largest problem instance investigated containing 85,900 entities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC, Boca Raton (2013)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)
Aloise, D., Damasceno, N., Mladenović, N., Pinheiro, D.: On strategies to fix degenerate k-means solutions. J. Classif. 34, 165–190 (2017)
Bagirov, A.M., Ordin, B., Ozturk, G., Xavier, A.E.: An incremental clustering algorithm based on hyperbolic smoothing. Comput. Optim. Appl. 61(1), 219–241 (2015)
Belacel, N., Hansen, P., Mladenović, N.: Fuzzy J-Means: a new heuristic for fuzzy clustering. Pattern Recogn. 35(10), 2193–2200 (2002)
Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013). Prediction, Control and Diagnosis using Advanced Neural Computations
Brimberg, J., Mladenović, N.: Degeneracy in the multi-source Weber problem. Math. Program. 85(1), 213–220 (1999)
Brimberg, J., Mladenović, N., Todosijević, R., Urosević, D.: Less is more: solving the max-mean diversity problem with variable neighborhood search. Inf. Sci. 382–383, 179–200 (2017). https://doi.org/10.1016/j.ins.2016.12.021. http://www.sciencedirect.com/science/article/pii/S0020025516320394
Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017). https://doi.org/10.1016/j.ins.2017.06.019. http://www.sciencedirect.com/science/article/pii/S0020025517307934
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Forgey, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)
Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Program. 51(1), 141–202 (1991)
Hansen, P., Mladenović, N.: Variable neighborhood search for the p-median. Locat. Sci. 5(4), 207–226 (1997)
Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recogn. 34(2), 405–413 (2001)
Hansen, P., Mladenović, N.: First vs. best improvement: an empirical study. Discret. Appl. Math. 154(5), 802–817 (2006). IV ALIO/EURO Workshop on Applied Combinatorial Optimization
Hansen, P., E., N., B., C., N., M.: Survey and comparison of initialization methods for k-means clustering. Paper not published
Hansen, P., Jaumard, B., Mladenović, N.: Minimum sum of squares clustering in a low dimensional space. J. Classif. 15(1), 37–55 (1998)
Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recogn. 45(12), 4337–4345 (2012)
Hansen, P., Mladenović, N., Todosijević, R., Hanafi, S.: Variable neighborhood search: basics and variants. EURO J. Comput. Optim. 1–32 (2016). https://doi.org/10.1007/s13675-016-0075-x
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Laszlo, M., Mukherjee, S.: A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 533–543 (2006)
Laszlo, M., Mukherjee, S.: A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn. Lett. 28(16), 2359–2366 (2007)
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003). Biometrics
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, vol. 1, pp. 281–297 (1967)
Mladenović, N.: A variable neighborhood algorithm-a new metaheuristic for combinatorial optimization. In: Papers Presented at Optimization Days, vol. 12 (1995)
Mladenović, N., Todosijević, R., Urosević, D.: Less is more: basic variable neighborhood search for minimum differential dispersion problem. Inf. Sci. 326, 160–171 (2016). https://doi.org/10.1016/j.ins.2015.07.044. http://www.sciencedirect.com/science/article/pii/S0020025515005526
Ordin, B., Bagirov, A.M.: A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J. Glob. Optim. 61(2), 341–361 (2015)
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)
Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl. Based Syst. 71, 345–365 (2014)
Reinelt, G.: TSPLIB—a traveling salesman problem library. ORSA J. Comput. 3(4), 376–384 (1991)
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)
Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. Eur. J. Oper. Res. 253(3), 659–672 (2016)
Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24(10), 1003–1008 (1991)
Silva, K., Aloise, D., de Souza, S.X., Mladenović, N.: Less is more: simplified Nelder-Mead method for large unconstrained optimization. Yugoslav J. Oper. Res. 28(2), 153–169 (2018). http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/609
Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Computers and Their Applications. E. Horwood, Chichester (1980)
Turkensteen, M., Andersen, K.A.: A Tabu Search Approach to Clustering, pp. 475–480. Springer, Berlin (2009)
Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Inf. Syst. Oper. Res. 21(2), 95–108 (1983)
Wishart, D.: 256. note: an algorithm for hierarchical classifications. Biometrics 25(1), 165–170 (1969)
Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recogn. 44(1), 70–77 (2011)
Acknowledgements
Thiago Pereira is grateful to CAPES-Brazil. Daniel Aloise and Nenad Mladenović were partially supported by CNPq-Brazil grants 308887/2014-0 and 400350/2014-9. This research was partially covered by the framework of the grant number BR05236839 “Development of information technologies and systems for stimulation of personality’s sustainable development as one of the bases of development of digital Kazakhstan”.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Pereira, T., Aloise, D., Brimberg, J., Mladenović, N. (2018). Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem. In: Pardalos, P., Migdalas, A. (eds) Open Problems in Optimization and Data Analysis. Springer Optimization and Its Applications, vol 141. Springer, Cham. https://doi.org/10.1007/978-3-319-99142-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-99142-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99141-2
Online ISBN: 978-3-319-99142-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)