Skip to main content

An Improved Genetic Clustering Algorithm for Categorical Data

  • Conference paper
  • 1019 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7769))

Abstract

Deng et al. [Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data, Knowledge-Based Systems 23, 144–149(2010)] proposed a mutual information based genetic clustering algorithm named G-ANMI for categorical data. While G-ANMI is superior or comparable to existing algorithms for clustering categorical data in terms of clustering accuracy, it is very time-consuming due to the low efficiency of genetic algorithm (GA). In this paper, we propose a new initialization method for G-ANMI to improve its efficiency. Experimental results show that the new method greatly improves the efficiency of G-ANMI as well as produces higher clustering accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  2. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)

    Article  Google Scholar 

  3. Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc. of CIKM 2002, pp. 582–589 (2002)

    Google Scholar 

  4. He, Z., Xu, X., Deng, S.: k-ANMI: a mutual information based clustering algorithm for categorical data. Information Fusion 9(2), 223–233 (2008)

    Article  Google Scholar 

  5. He, Z., Xu, X., Deng, S.: A cluster ensemble method for clustering categorical data. Information Fusion 6(2), 143–151 (2005)

    Article  Google Scholar 

  6. Cristofor, D., Simovici, D.: Finding median partitions using information-theoretical-based genetic algorithms. Journal of Universal Computer Science 8(2), 153–172 (2002)

    MathSciNet  Google Scholar 

  7. Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data. Knowledge-Based Systems 23, 144–149 (2010)

    Article  Google Scholar 

  8. Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press (1992)

    Google Scholar 

  9. Bai, L., Liang, J.Y., Dang, C.Y.: An initialization method to simultaneously find initial cluster and the number of clusters for clustering categorical data. Knowledge-Based Systems 24, 785–795 (2011)

    Article  Google Scholar 

  10. Herawan, T., Deris, M.M., Abawajy, J.H.: A rough set approach for selecting clustering attribute. Knowledge-Based Systems 23, 220–231 (2010)

    Article  Google Scholar 

  11. UCI Machine Learning Repository (2011), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qin, H., Ma, X., Herawan, T., Zain, J.M. (2013). An Improved Genetic Clustering Algorithm for Categorical Data. In: Washio, T., Luo, J. (eds) Emerging Trends in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36778-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36778-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36777-9

  • Online ISBN: 978-3-642-36778-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics