Skip to main content

Comparison Between k-Means and k-Medoids for Mixed Variables Clustering

  • Conference paper
  • First Online:
  • 897 Accesses

Abstract

This paper compares the performance of k-means and k-medoids in clustering objects with mixed variables. The k-means initially means for clustering objects with continuous variables as it uses Euclidean distance to compute distance between objects. While, k-medoids has been designed suitable for mixed type variables especially with PAM (partition around medoids). By using a mixed variables data set on a modified cancer data, we compared k-means and k-medoids on internal validity set up in R package. The result indicates that k-medoids is a good clustering option when the measured variables are mixed with different types.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 1–34. Chapman & Hall/CRC Press (2015)

    Google Scholar 

  2. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  3. Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 1249–1259 (2014). https://doi.org/10.1007/s11227-014-1225-7

  4. Tzortzis, G., Likas, A.: The MinMax k-means clustering algorithm. Pattern Recognit. 47, 2505–2516 (2014)

    Article  Google Scholar 

  5. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: In the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)

    Google Scholar 

  6. Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical data analysis based on the L1 norm and related methods, pp. 405–416. Faculty of Mathematics and Informatics, North-Holland (1987)

    Google Scholar 

  7. Jin, X., Han, J.: K-medoids clustering (2010). https://doi.org/10.1007/978-0-387-30164-8_426

  8. Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Clustering Analysis, pp. 68–125 (1990)

    Google Scholar 

  9. Hu, X., Xu, L.: Investigation on several model selection criteria for determining the number of cluster. Neural Inf. Process. Rev. 4, 1–10 (2004)

    Google Scholar 

  10. Podani, J.: Extending Gower’s general coefficient of similarity to ordinal characters. Taxon 48, 331–340 (1999)

    Article  Google Scholar 

  11. Sindik, J.: Two aspects of bias in multivariate studies: mixing specific with general concepts and “comparing apples and oranges”. J. Sport. Sci. Med. 3, 23–29 (2014)

    Google Scholar 

  12. Lourenço, F., Lobo, V., Bação, F.: Binary-based similarity measures for categorical data and their application in Self-Organizing Maps. Measurement, 1–18 (2004)

    Google Scholar 

  13. Hennig, C.: Package “fpc,” (2018)

    Google Scholar 

  14. Brock, G., Pihur, V., Datta, S.S., Datta, S.S.: clValid : an R package for cluster validation. J. Stat. Softw. 25, 1–28 (2008). doi:citeulike-article-id:2574494

    Google Scholar 

  15. Fernandes, K., Cardoso, J.S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 243–250. © Springer International Publishing (2017)

    Google Scholar 

  16. Ezat, S., Puteh, W., Norin-Rahayu, S., Noor, S., Syed, A., Azhar, S., Aljunid, S.M., Science, Q., Lumpur, K., Latiff, J.Y.: HPV positivity and its’ influencing factors among invasive cervical cancer women in Malaysia. Int. J. Public Health 1, 13–22 (2011)

    Article  Google Scholar 

  17. Roura, E., Castellsagué, X., Pawlita, M., Travier, N., Waterboer, T., Margall, N., Bosch, F.X., De Sanjosé, S., Dillner, J., Gram, I.T., Tjønneland, A., Munk, C., Pala, V., Palli, D., Khaw, K.T., Barnabas, R. V., Overvad, K., Clavel-Chapelon, F., Boutron-Ruault, M.C., Fagherazzi, G., Kaaks, R., Lukanova, A., Steffen, A., Trichopoulou, A., Trichopoulos, D., Klinaki, E., Tumino, R., Sacerdote, C., Panico, S., Bueno-De-Mesquita, H.B., Peeters, P.H., Lund, E., Weiderpass, E., Redondo, M.L., Sánchez, M.J., Tormo, M.J., Barricarte, A., Larrañaga, N., Ekström, J., Hortlund, M., Lindquist, D., Wareham, N., Travis, R.C., Rinaldi, S., Tommasino, M., Franceschi, S., Riboli, E.: Smoking as a major risk factor for cervical cancer and pre-cancer: results from the EPIC cohort. Int. J. Cancer 135, 453–466 (2014). https://doi.org/10.1002/ijc.28666

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norin Rahayu Shamsuddin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shamsuddin, N.R., Mahat, N.I. (2019). Comparison Between k-Means and k-Medoids for Mixed Variables Clustering. In: Kor, LK., Ahmad, AR., Idrus, Z., Mansor, K. (eds) Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017). Springer, Singapore. https://doi.org/10.1007/978-981-13-7279-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7279-7_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7278-0

  • Online ISBN: 978-981-13-7279-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics