Abstract
This paper compares the performance of k-means and k-medoids in clustering objects with mixed variables. The k-means initially means for clustering objects with continuous variables as it uses Euclidean distance to compute distance between objects. While, k-medoids has been designed suitable for mixed type variables especially with PAM (partition around medoids). By using a mixed variables data set on a modified cancer data, we compared k-means and k-medoids on internal validity set up in R package. The result indicates that k-medoids is a good clustering option when the measured variables are mixed with different types.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 1–34. Chapman & Hall/CRC Press (2015)
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)
Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 1249–1259 (2014). https://doi.org/10.1007/s11227-014-1225-7
Tzortzis, G., Likas, A.: The MinMax k-means clustering algorithm. Pattern Recognit. 47, 2505–2516 (2014)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: In the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical data analysis based on the L1 norm and related methods, pp. 405–416. Faculty of Mathematics and Informatics, North-Holland (1987)
Jin, X., Han, J.: K-medoids clustering (2010). https://doi.org/10.1007/978-0-387-30164-8_426
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Clustering Analysis, pp. 68–125 (1990)
Hu, X., Xu, L.: Investigation on several model selection criteria for determining the number of cluster. Neural Inf. Process. Rev. 4, 1–10 (2004)
Podani, J.: Extending Gower’s general coefficient of similarity to ordinal characters. Taxon 48, 331–340 (1999)
Sindik, J.: Two aspects of bias in multivariate studies: mixing specific with general concepts and “comparing apples and oranges”. J. Sport. Sci. Med. 3, 23–29 (2014)
Lourenço, F., Lobo, V., Bação, F.: Binary-based similarity measures for categorical data and their application in Self-Organizing Maps. Measurement, 1–18 (2004)
Hennig, C.: Package “fpc,” (2018)
Brock, G., Pihur, V., Datta, S.S., Datta, S.S.: clValid : an R package for cluster validation. J. Stat. Softw. 25, 1–28 (2008). doi:citeulike-article-id:2574494
Fernandes, K., Cardoso, J.S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. © Springer International Publishing (2017)
Ezat, S., Puteh, W., Norin-Rahayu, S., Noor, S., Syed, A., Azhar, S., Aljunid, S.M., Science, Q., Lumpur, K., Latiff, J.Y.: HPV positivity and its’ influencing factors among invasive cervical cancer women in Malaysia. Int. J. Public Health 1, 13–22 (2011)
Roura, E., Castellsagué, X., Pawlita, M., Travier, N., Waterboer, T., Margall, N., Bosch, F.X., De Sanjosé, S., Dillner, J., Gram, I.T., Tjønneland, A., Munk, C., Pala, V., Palli, D., Khaw, K.T., Barnabas, R. V., Overvad, K., Clavel-Chapelon, F., Boutron-Ruault, M.C., Fagherazzi, G., Kaaks, R., Lukanova, A., Steffen, A., Trichopoulou, A., Trichopoulos, D., Klinaki, E., Tumino, R., Sacerdote, C., Panico, S., Bueno-De-Mesquita, H.B., Peeters, P.H., Lund, E., Weiderpass, E., Redondo, M.L., Sánchez, M.J., Tormo, M.J., Barricarte, A., Larrañaga, N., Ekström, J., Hortlund, M., Lindquist, D., Wareham, N., Travis, R.C., Rinaldi, S., Tommasino, M., Franceschi, S., Riboli, E.: Smoking as a major risk factor for cervical cancer and pre-cancer: results from the EPIC cohort. Int. J. Cancer 135, 453–466 (2014). https://doi.org/10.1002/ijc.28666
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shamsuddin, N.R., Mahat, N.I. (2019). Comparison Between k-Means and k-Medoids for Mixed Variables Clustering. In: Kor, LK., Ahmad, AR., Idrus, Z., Mansor, K. (eds) Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017). Springer, Singapore. https://doi.org/10.1007/978-981-13-7279-7_37
Download citation
DOI: https://doi.org/10.1007/978-981-13-7279-7_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7278-0
Online ISBN: 978-981-13-7279-7
eBook Packages: Computer ScienceComputer Science (R0)