Abstract
K-medoids clustering is a popular partition-based clustering technique to identify usual patterns in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for it. In this paper, we propose an efficient K-medoids clustering algorithm which preserves the clustering performance by following the notion of a simple and fast K-medoids algorithm while improving the computational efficiency. The proposed algorithm does not require pre-calculating the distance matrix and therefore is applicable to large scale datasets. When a simple pruning rule is used, it can give near linear time performance. To this end, the complexity of this proposed algorithm is analyzed and found to be lower than that of the state of the art K-medoids algorithms. We test our algorithm on real data sets with millions of examples and experimental results show that the proposed algorithm outperforms state-of-the-art K-medoids clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amorèse, D., Bossu, R., & Mazet-Roux, G. (2015). Automatic clustering of macroseismic intensity data points from internet questionnaires: Efficiency of the partitioning around medoids (PAM). Seismological Research Letters, 86, 1171–1177.
Arumugam, M., Raes, J., & Pelletier, E. (2011). Enterotypes of the human gut microbiome. Nature, 506, 174–180.
Ayyala, D., & Lin, S. (2015). GrammR: Graphical representation and modeling of count data with application in metagenomics. Bioinformatics, 31, 1648–1654.
Bach, F.R., & Jordan, M.I. (2004, December). Blind one-microphone speech separation: a spectral learning approach. In Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS’04), (pp. 65–72). MIT Press.
Broin, P. Ó., Smith, T., & Golden, A. (2015). Alignment-free clustering of transcription factor Binding motifs using agenetic-k-medoids approach. BMC Bioinformatics, 16, 1–12.
Han, J., Kamber, M., & Tung, A.K.H. (2001). Spatial clustering methods in data mining: aaurvey. In H. J. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery. Taylor & Francis.
Jain, A.K. (2008). Data clustering: 50 years beyond Kmeans. In: W. Daelemans, B. Goethals & K. Morik (Eds.). Machine learning and knowledge discovery in databases. ECML PKDD 2008. Lecture notes in computer science, Vol. 5211, pp. 3–4, Springer, Berlin, Heidelberg.
Kaufman, L., & Rousseeuw, P.J. (1987). Clustering by means of medoids. In Y. Dodge (Ed.). Statistical data analysis based on the norm and related methods (pp. 405–416). North-Holland.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new PSO-based approach to fire flame detection using K-medoids clustering. Expert Systems with Applications, 68, 69–80.
Lai, P.-S., & Hu, H.-C. (2011). Variance enhanced K-medoids clustering. Expert Systems with Applications, 38, 764–775.
Lucasius, C.B., Dane, A.D., & Kateman, G. (1993). On K-medoid clustering of large data sets with the aid of agenetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669.
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). University of California Press.
Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.
Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference On Very Large Databases (pp. 144–155). Santiago, Chile.
Ohnishi, Y., Huber, W., & Tsumura, A. (2014). Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biology, 16, 27–37.
Park, H.-S., & Jun, C.-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341.
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492–1496.
van der Laan, M. J., Pollard, K. S., & Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584.
Wei, C.-P., Lee, Y.-H., & Hsu, C.-M. (2003). Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Systems with Applications, 24(4), 351–363.
Weiss, Y. (1999, February). Segmentation using eigenvectors: Aunified view. In Proceedings of the 7th IEEE International Conference on Computer Vision (pp. 975–982).
Xie, J., & Qu, Y. (2016). K-medoids clustering algorithms with optimized initial seeds by density peaks. Journal of Frontiers of Computer Science and Technology, 9, 230–247.
Yu, D., Liu, G., Guo, M., & Liu, X. (2018). An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Systems with Applications, 92, 464–473.
Zadegan, S. M. R., Mirzaie, M., & Sadoughi, F. (2013). Ranked K-medoids: afast and accurate rand-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 39, 133–143.
Zhang, Q., & Couloigner, I. (2005). Anew and efficient K-medoid algorithm for spatial clustering. Lecture Notes in Computer Science, 3482, 181–189.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Xi'an Jiaotong University Press
About this chapter
Cite this chapter
Wang, X., Wang, X., Wilkes, D.M. (2020). An Efficient K-Medoids Clustering Algorithm for Large Scale Data. In: Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment. Springer, Singapore. https://doi.org/10.1007/978-981-13-9217-7_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-9217-7_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9216-0
Online ISBN: 978-981-13-9217-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)