Improved Clustering for Categorical Data with Genetic Algorithm

Sharma, Abha; Thakur, R. S.

doi:10.1007/978-981-10-5565-2_6

Abha Sharma³⁰ &
R. S. Thakur³⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 453))

748 Accesses
1 Citations

Abstract

Clustering is the most significant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values which limits its use in data mining. The k-modes algorithm is one of the very effective for proper partitions of categorical data sets, though the algorithm stops at locally optimum solution as depended on initial cluster centres. Proposed algorithm utilizes the genetic algorithm (GA) to optimize the k-modes clustering algorithm. The reason is, considering noise as cluster centres gives the high cost which will not fit for the next iteration and also not gets stuck to the suboptimal solutions. The superiority of proposed algorithm is demonstrated for several real-life data sets in terms of accuracy and proves it is efficient and can reveal encouraging results especially for the large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

L. Kaufman, P. Rousseeuw, Finding Groups in Data: An introduction to cluster analysis (Wiley, USA, 2005)
Google Scholar
D. Lam, C. Donald, Wunsch, Clustering, Signal processing theory and machine learning, vol. 1 (2014), pp. 1115–1149
Google Scholar
S. Theodoridis, “Clustering Basic concepts”, Pattern Recognition, 3rd edn. (2006), pp. 483–516
Google Scholar
K. Krishna, M. Narasimha Murty, Genetic K-means algorithm. IEEE Trans. Syst. Mans Cybern. Part B: Cybern. 29(3), 433–439 (1999)
Google Scholar
U. Maulik, S. Bandyopadhyay, Genetic algorithm-based clustering technique. Pattern Recogn. 33, 1455–1465 (2000)
Article Google Scholar
J. Han, M. Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publisher, San Francisco, CA, 2001)
MATH Google Scholar
J.B. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1965), pp. 281–297
Google Scholar
S.S. Khan, A. Ahmad, Computing Initial points using Density Based Multiscale Data Condensation for Clustering Categorical data. 2nd International Conference on Applied Artificial Intelligence, ICAAI (2003)
Google Scholar
Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)
Article MathSciNet Google Scholar
A. Chaturvedi, P. Green, J. Carroll, k-modes clustering. J. Classif. 18, 35–55 (2001)
Article MathSciNet MATH Google Scholar
Rajashree Dash, Rasmita Dash, Comparative analysis of K-mean genetic algorithm based data clustering. Int. J. Adv. Comput. Math. Sci. 3, 257–265 (2012)
Google Scholar
https://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgements

The work is supported by research grant from MPCST Bhopal, under grant no. 1080/CST/R&D/2012 dated 30-06-2012.

Author information

Authors and Affiliations

Maulana Azad National Institute of Technology, Bhopal, India
Abha Sharma & R. S. Thakur

Authors

Abha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
R. S. Thakur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abha Sharma .

Editor information

Editors and Affiliations

Department of Electronics & Communicatio, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India
Vijay Nath

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, A., Thakur, R.S. (2018). Improved Clustering for Categorical Data with Genetic Algorithm. In: Nath, V. (eds) Proceedings of the International Conference on Microelectronics, Computing & Communication Systems. Lecture Notes in Electrical Engineering, vol 453. Springer, Singapore. https://doi.org/10.1007/978-981-10-5565-2_6

Download citation

DOI: https://doi.org/10.1007/978-981-10-5565-2_6
Published: 30 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5564-5
Online ISBN: 978-981-10-5565-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics