Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 453))

Abstract

Clustering is the most significant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values which limits its use in data mining. The k-modes algorithm is one of the very effective for proper partitions of categorical data sets, though the algorithm stops at locally optimum solution as depended on initial cluster centres. Proposed algorithm utilizes the genetic algorithm (GA) to optimize the k-modes clustering algorithm. The reason is, considering noise as cluster centres gives the high cost which will not fit for the next iteration and also not gets stuck to the suboptimal solutions. The superiority of proposed algorithm is demonstrated for several real-life data sets in terms of accuracy and proves it is efficient and can reveal encouraging results especially for the large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. L. Kaufman, P. Rousseeuw, Finding Groups in Data: An introduction to cluster analysis (Wiley, USA, 2005)

    Google Scholar 

  2. D. Lam, C. Donald, Wunsch, Clustering, Signal processing theory and machine learning, vol. 1 (2014), pp. 1115–1149

    Google Scholar 

  3. S. Theodoridis, “Clustering Basic concepts”, Pattern Recognition, 3rd edn. (2006), pp. 483–516

    Google Scholar 

  4. K. Krishna, M. Narasimha Murty, Genetic K-means algorithm. IEEE Trans. Syst. Mans Cybern. Part B: Cybern. 29(3), 433–439 (1999)

    Google Scholar 

  5. U. Maulik, S. Bandyopadhyay, Genetic algorithm-based clustering technique. Pattern Recogn. 33, 1455–1465 (2000)

    Article  Google Scholar 

  6. J. Han, M. Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publisher, San Francisco, CA, 2001)

    MATH  Google Scholar 

  7. J.B. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1965), pp. 281–297

    Google Scholar 

  8. S.S. Khan, A. Ahmad, Computing Initial points using Density Based Multiscale Data Condensation for Clustering Categorical data. 2nd International Conference on Applied Artificial Intelligence, ICAAI (2003)

    Google Scholar 

  9. Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)

    Article  MathSciNet  Google Scholar 

  10. A. Chaturvedi, P. Green, J. Carroll, k-modes clustering. J. Classif. 18, 35–55 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  11. Rajashree Dash, Rasmita Dash, Comparative analysis of K-mean genetic algorithm based data clustering. Int. J. Adv. Comput. Math. Sci. 3, 257–265 (2012)

    Google Scholar 

  12. https://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgements

The work is supported by research grant from MPCST Bhopal, under grant no. 1080/CST/R&D/2012 dated 30-06-2012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abha Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, A., Thakur, R.S. (2018). Improved Clustering for Categorical Data with Genetic Algorithm. In: Nath, V. (eds) Proceedings of the International Conference on Microelectronics, Computing & Communication Systems. Lecture Notes in Electrical Engineering, vol 453. Springer, Singapore. https://doi.org/10.1007/978-981-10-5565-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5565-2_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5564-5

  • Online ISBN: 978-981-10-5565-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics