An Efficient K-Medoids Clustering Algorithm for Large Scale Data

Wang, Xiaochun; Wang, Xiali; Wilkes, Don Mitchell

doi:10.1007/978-981-13-9217-7_5

Xiaochun Wang⁴,
Xiali Wang⁵ &
Don Mitchell Wilkes⁶

602 Accesses
2 Citations

Abstract

K-medoids clustering is a popular partition-based clustering technique to identify usual patterns in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for it. In this paper, we propose an efficient K-medoids clustering algorithm which preserves the clustering performance by following the notion of a simple and fast K-medoids algorithm while improving the computational efficiency. The proposed algorithm does not require pre-calculating the distance matrix and therefore is applicable to large scale datasets. When a simple pruning rule is used, it can give near linear time performance. To this end, the complexity of this proposed algorithm is analyzed and found to be lower than that of the state of the art K-medoids algorithms. We test our algorithm on real data sets with millions of examples and experimental results show that the proposed algorithm outperforms state-of-the-art K-medoids clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amorèse, D., Bossu, R., & Mazet-Roux, G. (2015). Automatic clustering of macroseismic intensity data points from internet questionnaires: Efficiency of the partitioning around medoids (PAM). Seismological Research Letters, 86, 1171–1177.
Article Google Scholar
Arumugam, M., Raes, J., & Pelletier, E. (2011). Enterotypes of the human gut microbiome. Nature, 506, 174–180.
Article Google Scholar
Ayyala, D., & Lin, S. (2015). GrammR: Graphical representation and modeling of count data with application in metagenomics. Bioinformatics, 31, 1648–1654.
Article Google Scholar
Bach, F.R., & Jordan, M.I. (2004, December). Blind one-microphone speech separation: a spectral learning approach. In Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS’04), (pp. 65–72). MIT Press.
Google Scholar
Broin, P. Ó., Smith, T., & Golden, A. (2015). Alignment-free clustering of transcription factor Binding motifs using agenetic-k-medoids approach. BMC Bioinformatics, 16, 1–12.
Article Google Scholar
Han, J., Kamber, M., & Tung, A.K.H. (2001). Spatial clustering methods in data mining: aaurvey. In H. J. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery. Taylor & Francis.
Google Scholar
Jain, A.K. (2008). Data clustering: 50 years beyond Kmeans. In: W. Daelemans, B. Goethals & K. Morik (Eds.). Machine learning and knowledge discovery in databases. ECML PKDD 2008. Lecture notes in computer science, Vol. 5211, pp. 3–4, Springer, Berlin, Heidelberg.
Google Scholar
Kaufman, L., & Rousseeuw, P.J. (1987). Clustering by means of medoids. In Y. Dodge (Ed.). Statistical data analysis based on the norm and related methods (pp. 405–416). North-Holland.
Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Book Google Scholar
Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new PSO-based approach to fire flame detection using K-medoids clustering. Expert Systems with Applications, 68, 69–80.
Article Google Scholar
Lai, P.-S., & Hu, H.-C. (2011). Variance enhanced K-medoids clustering. Expert Systems with Applications, 38, 764–775.
Article Google Scholar
Lucasius, C.B., Dane, A.D., & Kateman, G. (1993). On K-medoid clustering of large data sets with the aid of agenetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669.
Google Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). University of California Press.
Google Scholar
Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.
Article Google Scholar
Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference On Very Large Databases (pp. 144–155). Santiago, Chile.
Google Scholar
Ohnishi, Y., Huber, W., & Tsumura, A. (2014). Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biology, 16, 27–37.
Article Google Scholar
Park, H.-S., & Jun, C.-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341.
Article Google Scholar
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492–1496.
Article Google Scholar
van der Laan, M. J., Pollard, K. S., & Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584.
Article MathSciNet Google Scholar
Wei, C.-P., Lee, Y.-H., & Hsu, C.-M. (2003). Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Systems with Applications, 24(4), 351–363.
Article Google Scholar
Weiss, Y. (1999, February). Segmentation using eigenvectors: Aunified view. In Proceedings of the 7th IEEE International Conference on Computer Vision (pp. 975–982).
Google Scholar
Xie, J., & Qu, Y. (2016). K-medoids clustering algorithms with optimized initial seeds by density peaks. Journal of Frontiers of Computer Science and Technology, 9, 230–247.
Google Scholar
Yu, D., Liu, G., Guo, M., & Liu, X. (2018). An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Systems with Applications, 92, 464–473.
Article Google Scholar
Zadegan, S. M. R., Mirzaie, M., & Sadoughi, F. (2013). Ranked K-medoids: afast and accurate rand-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 39, 133–143.
Article Google Scholar
Zhang, Q., & Couloigner, I. (2005). Anew and efficient K-medoid algorithm for spatial clustering. Lecture Notes in Computer Science, 3482, 181–189.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
Xiaochun Wang
School of Information Engineering, Chang’an University, Xi’an, Shaanxi, China
Xiali Wang
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Don Mitchell Wilkes

Authors

Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiali Wang
View author publications
You can also search for this author in PubMed Google Scholar
Don Mitchell Wilkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Wang, X., Wilkes, D.M. (2020). An Efficient K-Medoids Clustering Algorithm for Large Scale Data. In: Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment. Springer, Singapore. https://doi.org/10.1007/978-981-13-9217-7_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-9217-7_5
Published: 13 August 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9216-0
Online ISBN: 978-981-13-9217-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics