Abstract
Clustering is the most significant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values which limits its use in data mining. The k-modes algorithm is one of the very effective for proper partitions of categorical data sets, though the algorithm stops at locally optimum solution as depended on initial cluster centres. Proposed algorithm utilizes the genetic algorithm (GA) to optimize the k-modes clustering algorithm. The reason is, considering noise as cluster centres gives the high cost which will not fit for the next iteration and also not gets stuck to the suboptimal solutions. The superiority of proposed algorithm is demonstrated for several real-life data sets in terms of accuracy and proves it is efficient and can reveal encouraging results especially for the large datasets.
References
L. Kaufman, P. Rousseeuw, Finding Groups in Data: An introduction to cluster analysis (Wiley, USA, 2005)
D. Lam, C. Donald, Wunsch, Clustering, Signal processing theory and machine learning, vol. 1 (2014), pp. 1115–1149
S. Theodoridis, “Clustering Basic concepts”, Pattern Recognition, 3rd edn. (2006), pp. 483–516
K. Krishna, M. Narasimha Murty, Genetic K-means algorithm. IEEE Trans. Syst. Mans Cybern. Part B: Cybern. 29(3), 433–439 (1999)
U. Maulik, S. Bandyopadhyay, Genetic algorithm-based clustering technique. Pattern Recogn. 33, 1455–1465 (2000)
J. Han, M. Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publisher, San Francisco, CA, 2001)
J.B. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1965), pp. 281–297
S.S. Khan, A. Ahmad, Computing Initial points using Density Based Multiscale Data Condensation for Clustering Categorical data. 2nd International Conference on Applied Artificial Intelligence, ICAAI (2003)
Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)
A. Chaturvedi, P. Green, J. Carroll, k-modes clustering. J. Classif. 18, 35–55 (2001)
Rajashree Dash, Rasmita Dash, Comparative analysis of K-mean genetic algorithm based data clustering. Int. J. Adv. Comput. Math. Sci. 3, 257–265 (2012)
Acknowledgements
The work is supported by research grant from MPCST Bhopal, under grant no. 1080/CST/R&D/2012 dated 30-06-2012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sharma, A., Thakur, R.S. (2018). Improved Clustering for Categorical Data with Genetic Algorithm. In: Nath, V. (eds) Proceedings of the International Conference on Microelectronics, Computing & Communication Systems. Lecture Notes in Electrical Engineering, vol 453. Springer, Singapore. https://doi.org/10.1007/978-981-10-5565-2_6
Download citation
DOI: https://doi.org/10.1007/978-981-10-5565-2_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5564-5
Online ISBN: 978-981-10-5565-2
eBook Packages: EngineeringEngineering (R0)