Coherent Method for Determining the Initial Cluster Center
Several aspects of research works are now carried out on clustering of objects where the main focus is on finding the near-optimal cluster centers and obtaining the best possible clusters into which the objects fall into so that the desired expectations are met. This is because a bad selection of cluster center may result in dragging a data very far away from its actual cluster resulting in deficient clustering. Hence, we have accentuated on determining the near-optimal cluster centers and also position the data in their real clusters. We have explored three kinds of clustering techniques, viz. K-Means, FEKM-, and TLBO-based clusterings applied on quite a few data sets. Analysis was made considering two factors, namely cluster validation and average quantization error. Dunn’s index, Davies–Bouldin index, silhouette coefficient, and C index were used for quantitative evaluation of the clustering results. As per our anticipation, almost all validity indices provide promising outcome for both FEKM- and TLBO-based clusterings than K-Means inferring superior cluster formation. Further tests support that FEKM- and TLBO-based clustering has smaller value of quantization error than K-Means.
KeywordsOptimal centroid Cluster validation K-Means FEKM- and TLBO-based clustering
We are extremely thankful to Sagarika Swain who provided expertise that greatly assisted the work. The authors also express gratitude to the editors and the anonymous referees for any productive suggestions on the paper.
- 1.Jain, A.K., Topchy, A., Law, M.H.C., and Buhmann J.M., “Landscape of clustering algorithms”, ‘in Proc. IAPR International conference on pattern recognition, Cambridge, UK’, pp. 260–263, 2004.Google Scholar
- 2.L. Kaufman, P.J. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, 1990.Google Scholar
- 3.Huang Z, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery”, Vol. 2, pp. 283–304, 1998.Google Scholar
- 4.A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, pp 264–323, September, 1999.Google Scholar
- 5.Vladimir Estivill Castro, “Why so many clustering algorithms—A Position Paper”, ‘SIGKDD Explorations’, vol. 4, issue 1, pp 65–75,2002.Google Scholar
- 6.H. Xiong, G. Pandey, M. Steinbach and V. Kumar, “Enhancing Data Analysis with Noise Removal”, “IEEE Transactions on Knowledge and Data Engineering”, volume: 18, Issue: 3, pp. 304–319, 2006.Google Scholar
- 8.C.S. Li, “Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters”, “2011 International Conference on Advances in Engineering, Elsevier”, pp. 324–328, vol. 24, 2011.Google Scholar
- 9.Fuyuan Cao, Jiye Liang, Guang Jiang, “An initialization method for the K-Means algorithm using neighborhood model”, ‘Computers and Mathematics with Applications’, pp. 474–483, 2009.Google Scholar
- 12.B.K. Mishra, N.R. Nayak, A.K. Rath and S. Swain, “Far Efficient K-Means Clustering Algorithm”, “Proceedings of the International Conference on Advances in Computing, Communications and Informatics”, ACM, pp. 106–110, 2012.Google Scholar
- 14.B.K. Mishra, N.R. Nayak, A.K. Rath and S. Swain, “Improving the Efficiency of Clustering by Using an Enhanced Clustering Methodology”, “International Journal of Advances in Engineering & Technology”, Vol. 4, Issue 2, pp. 415–424, 2012.Google Scholar
- 15.C. Merz and P. Murphy, UCI Repository of Machine Learning Databases, Available: http://ftp.ics.uci.edu/pub/machine-learning-databases.
- 16.Bagirov, A.M and Yearwood, J, “A new non-smooth optimization algorithm for minimum sum-of-squares clustering problems”, EJOR 170, 2 (2006), pp. 578–596.Google Scholar
- 18.J. C. Dunn, ‘A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters’. ‘J. Cybernetics’, vol. 3, pp. 32–57, 1973.Google Scholar
- 22.Shi Na, L. Xumin and G. Yong, “Research on K-Means clustering algorithm-An Improved K-Means Clustering Algorithm”. “IEEE 3rd International Symposium on Intelligent Information Technology and Security Informatics”, pp. 63–67, 2010.Google Scholar
- 23.R. Xu and D. Wunsch, ‘Survey of Clustering Algorithms’, “IEEE Transactions on Neural networks”, vol. 16, no. 3, 2005.Google Scholar
- 25.K. A. Abdul Nazeer, M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm”, “Proceedings of the World Congress on Engineering”, Vol I, 2009.Google Scholar
- 26.B. Amiri, (2012). ‘Application of Teaching-Learning-Based Optimization Algorithm on Cluster Analysis’. Journal of Basic and Applied Scientific Research, 2(11), pp. 11795–11802.Google Scholar
- 27.A. Naik. S. C Satpathy and K. Parvathi, ‘Improvement of initial cluster centre of c-means using Teaching learning based optimization’. ‘2nd International Conference on Communication, Computing & Security’, pp. 428–435, 2012.Google Scholar
- 28.J. Mac Queen, “Some methods for classification and analysis of multivariate observations”, “Fifth Berkeley Symposium on Mathematics, Statistics and Probability”, pp. 281–297, University of California Press, 1967.Google Scholar
- 30.S. C Satpathy and A. Naik, ‘Data Clustering Based on Teaching-Learning-Based Optimization’. SEMCCO, LNCS 7077, pp. 148–156, 2011.Google Scholar