Abstract
Data clustering is an unsupervised classification method aimed at creating groups of objects, or clusters that are distinct. Among the clustering techniques, K-means is the most widely used technique. Two issues are prominent in creating a K-means clustering algorithm—the optimal number of clusters and the center of the clusters. In most cases, the number of clusters is predetermined by the researcher, thus leaving out the challenge where to put the cluster centers so that scattered points can be grouped properly. However, if it is not chosen correctly it will increase the computational complexity especially for high dimensional data set. To obtain an optimum solution for K-means cluster analysis, the data needs to be preprocessed. This is achieved by either data standardization or using principal component analysis on a scale data to reduce the dimensionality of the data. Based on the outcomes of the preprocessing carried out on the data, a hybrid K-means clustering method of center initialization is developed for producing optimum quality clusters which makes the algorithm more efficient. The result showed that K-means with preprocessed data performed better, judging from the sum of square error. Further experiment on the hybrid K-means algorithm was conducted simulated datasets and it was observed that, the sum of the total clustering errors reduced significantly whereas inter distances between clusters are preserved to be as large as possible for better clusters identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tsai, C.Y., Chiu, C.C.: Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput. Stat. Data Anal. 52, 4658–4672 (2008)
Zhu, Y., Yu, J., Jia, C.: Initializing K-means clustering using affinity propagation. In: Ninth International Conference on Hybrid Intelligent Systems, vol. 1, pp. 338–343 (2009)
Chandrasekhar, T., Thangavel, K., Elayaraja, E.: Effective clustering algorithms for gene expression data. Int. J. Comput. Appl. 32(4), 25–29 (2011)
Chris, D., Xiaofeng, H.: K-means clustering via principal component analysis. In: Proceeding of the 21st International Conference on Machine Learning. Banff, Canada (2006)
Rana, S., Jasola, S., Kumar, R.: A hybrid sequential approach for data clustering using K-means and particle swarm optimization algorithm. Int. J. Eng. Sci. Technol. 2(6), 167–176 (2010)
Su, T., Dy, J.: A deterministic method for initializing K-means clustering. In: 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2004, pp. 784–786 (2004)
Arai, K., Barakbah, A.R.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Rep. Fac. Sci. Eng. 36(1), 25–31 (2007). Saga University
Karthikeyani, V.N., Thangavel, K.: Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4(4), 168–172 (2009)
Werner, M.: Identification of multivariate outliers in large data sets. Doctor Philosophy, University of Colorado, Denver (2003)
Zhao, Y., Wang, E., Liu, H., Rotunno, M., Koshiol, J., Marincola, F.M., Teresa, M.L., McShane, M.L.: Evaluation of normalization methods for two channel MicroRNA microarrays. J. Transl. Med. 8, 62–69 (2010)
Berry, M.J.A., Linoff, G.S.: Data mining techniques for marketing, sales and customer support. Wiley, New York (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Usman, D., Mohamad, I.B. (2017). A Hybrid K-Means Algorithm Combining Preprocessing-Wise and Centroid Based-Criteria for High Dimension Datasets. In: Ahmad, AR., Kor, L., Ahmad, I., Idrus, Z. (eds) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015). Springer, Singapore. https://doi.org/10.1007/978-981-10-2772-7_11
Download citation
DOI: https://doi.org/10.1007/978-981-10-2772-7_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2770-3
Online ISBN: 978-981-10-2772-7
eBook Packages: EducationEducation (R0)