Skip to main content

Abstract

K-medoids clustering is a popular partition-based clustering technique to identify usual patterns in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for it. In this paper, we propose an efficient K-medoids clustering algorithm which preserves the clustering performance by following the notion of a simple and fast K-medoids algorithm while improving the computational efficiency. The proposed algorithm does not require pre-calculating the distance matrix and therefore is applicable to large scale datasets. When a simple pruning rule is used, it can give near linear time performance. To this end, the complexity of this proposed algorithm is analyzed and found to be lower than that of the state of the art K-medoids algorithms. We test our algorithm on real data sets with millions of examples and experimental results show that the proposed algorithm outperforms state-of-the-art K-medoids clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Amorèse, D., Bossu, R., & Mazet-Roux, G. (2015). Automatic clustering of macroseismic intensity data points from internet questionnaires: Efficiency of the partitioning around medoids (PAM). Seismological Research Letters, 86, 1171–1177.

    Article  Google Scholar 

  • Arumugam, M., Raes, J., & Pelletier, E. (2011). Enterotypes of the human gut microbiome. Nature, 506, 174–180.

    Article  Google Scholar 

  • Ayyala, D., & Lin, S. (2015). GrammR: Graphical representation and modeling of count data with application in metagenomics. Bioinformatics, 31, 1648–1654.

    Article  Google Scholar 

  • Bach, F.R., & Jordan, M.I. (2004, December). Blind one-microphone speech separation: a spectral learning approach. In Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS’04), (pp. 65–72). MIT Press.

    Google Scholar 

  • Broin, P. Ó., Smith, T., & Golden, A. (2015). Alignment-free clustering of transcription factor Binding motifs using agenetic-k-medoids approach. BMC Bioinformatics, 16, 1–12.

    Article  Google Scholar 

  • Han, J., Kamber, M., & Tung, A.K.H. (2001). Spatial clustering methods in data mining: aaurvey. In H. J. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery. Taylor & Francis.

    Google Scholar 

  • Jain, A.K. (2008). Data clustering: 50 years beyond Kmeans. In: W. Daelemans, B. Goethals & K. Morik (Eds.). Machine learning and knowledge discovery in databases. ECML PKDD 2008. Lecture notes in computer science, Vol. 5211, pp. 3–4, Springer, Berlin, Heidelberg.

    Google Scholar 

  • Kaufman, L., & Rousseeuw, P.J. (1987). Clustering by means of medoids. In Y. Dodge (Ed.). Statistical data analysis based on the norm and related methods (pp. 405–416). North-Holland.

    Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Book  Google Scholar 

  • Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new PSO-based approach to fire flame detection using K-medoids clustering. Expert Systems with Applications, 68, 69–80.

    Article  Google Scholar 

  • Lai, P.-S., & Hu, H.-C. (2011). Variance enhanced K-medoids clustering. Expert Systems with Applications, 38, 764–775.

    Article  Google Scholar 

  • Lucasius, C.B., Dane, A.D., & Kateman, G. (1993). On K-medoid clustering of large data sets with the aid of agenetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669.

    Google Scholar 

  • MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). University of California Press.

    Google Scholar 

  • Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.

    Article  Google Scholar 

  • Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference On Very Large Databases (pp. 144–155). Santiago, Chile.

    Google Scholar 

  • Ohnishi, Y., Huber, W., & Tsumura, A. (2014). Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biology, 16, 27–37.

    Article  Google Scholar 

  • Park, H.-S., & Jun, C.-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341.

    Article  Google Scholar 

  • Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492–1496.

    Article  Google Scholar 

  • van der Laan, M. J., Pollard, K. S., & Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584.

    Article  MathSciNet  Google Scholar 

  • Wei, C.-P., Lee, Y.-H., & Hsu, C.-M. (2003). Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Systems with Applications, 24(4), 351–363.

    Article  Google Scholar 

  • Weiss, Y. (1999, February). Segmentation using eigenvectors: Aunified view. In Proceedings of the 7th IEEE International Conference on Computer Vision (pp. 975–982).

    Google Scholar 

  • Xie, J., & Qu, Y. (2016). K-medoids clustering algorithms with optimized initial seeds by density peaks. Journal of Frontiers of Computer Science and Technology, 9, 230–247.

    Google Scholar 

  • Yu, D., Liu, G., Guo, M., & Liu, X. (2018). An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Systems with Applications, 92, 464–473.

    Article  Google Scholar 

  • Zadegan, S. M. R., Mirzaie, M., & Sadoughi, F. (2013). Ranked K-medoids: afast and accurate rand-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 39, 133–143.

    Article  Google Scholar 

  • Zhang, Q., & Couloigner, I. (2005). Anew and efficient K-medoid algorithm for spatial clustering. Lecture Notes in Computer Science, 3482, 181–189.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Xi'an Jiaotong University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, X., Wang, X., Wilkes, D.M. (2020). An Efficient K-Medoids Clustering Algorithm for Large Scale Data. In: Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment. Springer, Singapore. https://doi.org/10.1007/978-981-13-9217-7_5

Download citation

Publish with us

Policies and ethics