An Improved Genetic Clustering Algorithm for Categorical Data

Qin, Hongwu; Ma, Xiuqin; Herawan, Tutut; Zain, Jasni Mohamad

doi:10.1007/978-3-642-36778-6_9

An Improved Genetic Clustering Algorithm for Categorical Data

Hongwu Qin²¹,
Xiuqin Ma²¹,
Tutut Herawan²¹ &
…
Jasni Mohamad Zain²¹

Conference paper

1019 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7769))

Abstract

Deng et al. [Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data, Knowledge-Based Systems 23, 144–149(2010)] proposed a mutual information based genetic clustering algorithm named G-ANMI for categorical data. While G-ANMI is superior or comparable to existing algorithms for clustering categorical data in terms of clustering accuracy, it is very time-consuming due to the low efficiency of genetic algorithm (GA). In this paper, we propose a new initialization method for G-ANMI to improve its efficiency. Experimental results show that the new method greatly improves the efficiency of G-ANMI as well as produces higher clustering accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Article Google Scholar
Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc. of CIKM 2002, pp. 582–589 (2002)
Google Scholar
He, Z., Xu, X., Deng, S.: k-ANMI: a mutual information based clustering algorithm for categorical data. Information Fusion 9(2), 223–233 (2008)
Article Google Scholar
He, Z., Xu, X., Deng, S.: A cluster ensemble method for clustering categorical data. Information Fusion 6(2), 143–151 (2005)
Article Google Scholar
Cristofor, D., Simovici, D.: Finding median partitions using information-theoretical-based genetic algorithms. Journal of Universal Computer Science 8(2), 153–172 (2002)
MathSciNet Google Scholar
Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data. Knowledge-Based Systems 23, 144–149 (2010)
Article Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press (1992)
Google Scholar
Bai, L., Liang, J.Y., Dang, C.Y.: An initialization method to simultaneously find initial cluster and the number of clusters for clustering categorical data. Knowledge-Based Systems 24, 785–795 (2011)
Article Google Scholar
Herawan, T., Deris, M.M., Abawajy, J.H.: A rough set approach for selecting clustering attribute. Knowledge-Based Systems 23, 220–231 (2010)
Article Google Scholar
UCI Machine Learning Repository (2011), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Faculty of Computer Systems and Software Engineering, Universiti Malaysia Pahang, Lebuh Raya Tun Razak, Gambang, 26300, Kuantan, Malaysia
Hongwu Qin, Xiuqin Ma, Tutut Herawan & Jasni Mohamad Zain

Authors

Hongwu Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiuqin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Tutut Herawan
View author publications
You can also search for this author in PubMed Google Scholar
Jasni Mohamad Zain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIR, Osaka University, 8-1, Mihogaoka, Ibaraki, Osaka, Japan
Takashi Washio
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, 518055, Shenzhen, Guangdong, China
Jun Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, H., Ma, X., Herawan, T., Zain, J.M. (2013). An Improved Genetic Clustering Algorithm for Categorical Data. In: Washio, T., Luo, J. (eds) Emerging Trends in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36778-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-36778-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36777-9
Online ISBN: 978-3-642-36778-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics