Advertisement

Comparison Between k-Means and k-Medoids for Mixed Variables Clustering

  • Norin Rahayu ShamsuddinEmail author
  • Nor Idayu Mahat
Conference paper

Abstract

This paper compares the performance of k-means and k-medoids in clustering objects with mixed variables. The k-means initially means for clustering objects with continuous variables as it uses Euclidean distance to compute distance between objects. While, k-medoids has been designed suitable for mixed type variables especially with PAM (partition around medoids). By using a mixed variables data set on a modified cancer data, we compared k-means and k-medoids on internal validity set up in R package. The result indicates that k-medoids is a good clustering option when the measured variables are mixed with different types.

Keywords

Mixed variables k-means k-medoids Silhouette Dunn index 

References

  1. 1.
    Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 1–34. Chapman & Hall/CRC Press (2015)Google Scholar
  2. 2.
    Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  3. 3.
    Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 1249–1259 (2014).  https://doi.org/10.1007/s11227-014-1225-7
  4. 4.
    Tzortzis, G., Likas, A.: The MinMax k-means clustering algorithm. Pattern Recognit. 47, 2505–2516 (2014)CrossRefGoogle Scholar
  5. 5.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: In the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)Google Scholar
  6. 6.
    Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical data analysis based on the L1 norm and related methods, pp. 405–416. Faculty of Mathematics and Informatics, North-Holland (1987)Google Scholar
  7. 7.
    Jin, X., Han, J.: K-medoids clustering (2010).  https://doi.org/10.1007/978-0-387-30164-8_426
  8. 8.
    Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Clustering Analysis, pp. 68–125 (1990)Google Scholar
  9. 9.
    Hu, X., Xu, L.: Investigation on several model selection criteria for determining the number of cluster. Neural Inf. Process. Rev. 4, 1–10 (2004)Google Scholar
  10. 10.
    Podani, J.: Extending Gower’s general coefficient of similarity to ordinal characters. Taxon 48, 331–340 (1999)CrossRefGoogle Scholar
  11. 11.
    Sindik, J.: Two aspects of bias in multivariate studies: mixing specific with general concepts and “comparing apples and oranges”. J. Sport. Sci. Med. 3, 23–29 (2014)Google Scholar
  12. 12.
    Lourenço, F., Lobo, V., Bação, F.: Binary-based similarity measures for categorical data and their application in Self-Organizing Maps. Measurement, 1–18 (2004)Google Scholar
  13. 13.
    Hennig, C.: Package “fpc,” (2018)Google Scholar
  14. 14.
    Brock, G., Pihur, V., Datta, S.S., Datta, S.S.: clValid : an R package for cluster validation. J. Stat. Softw. 25, 1–28 (2008). doi:citeulike-article-id:2574494Google Scholar
  15. 15.
    Fernandes, K., Cardoso, J.S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. © Springer International Publishing (2017)Google Scholar
  16. 16.
    Ezat, S., Puteh, W., Norin-Rahayu, S., Noor, S., Syed, A., Azhar, S., Aljunid, S.M., Science, Q., Lumpur, K., Latiff, J.Y.: HPV positivity and its’ influencing factors among invasive cervical cancer women in Malaysia. Int. J. Public Health 1, 13–22 (2011)CrossRefGoogle Scholar
  17. 17.
    Roura, E., Castellsagué, X., Pawlita, M., Travier, N., Waterboer, T., Margall, N., Bosch, F.X., De Sanjosé, S., Dillner, J., Gram, I.T., Tjønneland, A., Munk, C., Pala, V., Palli, D., Khaw, K.T., Barnabas, R. V., Overvad, K., Clavel-Chapelon, F., Boutron-Ruault, M.C., Fagherazzi, G., Kaaks, R., Lukanova, A., Steffen, A., Trichopoulou, A., Trichopoulos, D., Klinaki, E., Tumino, R., Sacerdote, C., Panico, S., Bueno-De-Mesquita, H.B., Peeters, P.H., Lund, E., Weiderpass, E., Redondo, M.L., Sánchez, M.J., Tormo, M.J., Barricarte, A., Larrañaga, N., Ekström, J., Hortlund, M., Lindquist, D., Wareham, N., Travis, R.C., Rinaldi, S., Tommasino, M., Franceschi, S., Riboli, E.: Smoking as a major risk factor for cervical cancer and pre-cancer: results from the EPIC cohort. Int. J. Cancer 135, 453–466 (2014).  https://doi.org/10.1002/ijc.28666

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Faculty of Computer and Mathematical SciencesUniversiti Teknologi MARAShah AlamMalaysia
  2. 2.School of Quantitative SciencesCollege of Arts and Sciences, Universiti Utara MalaysiaChanglunMalaysia

Personalised recommendations