The Use of Transfer Algorithm for Clustering Categorical Data

Xiang, Zhengrong; Ji, Lichuan

doi:10.1007/978-3-642-53917-6_6

Zhengrong Xiang²⁵ &
Lichuan Ji²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3157 Accesses
1 Citations

Abstract

We propose a new method for clustering categorical data. Clustering algorithms need to be designed specifically for categorical data because it has a different nature from numerical data. Here our focus is on the partition paradigm of algorithms. One existing approach is to transform categorical data into binary data and then use k-means. However it’s computationally inefficient. Another approach is k-modes, which extends k-means by replacing means with modes. In our work, we show that the center-based objective function of k-modes can not produce accurate clustering results. Instead, we propose an objective function that is generalized from the k-means objective, but not based on centers. We show that it’s more effective than the center-based objective and demonstrate it with real-life datasets. We also find that by using a particular algorithm called transfer algorithm, the proposed objective function can be efficiently solved. Thus our method is both efficient and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS–clustering categorical data using summaries. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–83. ACM (1999)
Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. Databases, 1 (1998)
Google Scholar
Barbar, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589. ACM (2002)
Google Scholar
Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: Scalable clustering of categorical data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)
Chapter Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Article Google Scholar
Steinley, D.: K–means clustering: A half–century synthesis. British Journal of Mathematical and Statistical Psychology 59(1), 1–34 (2006)
Article MathSciNet Google Scholar
Ralambondrainy, H.: A conceptual version of the K-means algorithm. Pattern Recognition Letters 16(11), 1147–1157 (1995)
Article Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. Red 30(2), 3 (2008)
Google Scholar
Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Article Google Scholar
He, Z., Deng, S., Xu, X.: Improving K-modes algorithm considering frequencies of attribute values in mode. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-M., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005, Part I. LNCS (LNAI), vol. 3801, pp. 157–162. Springer, Heidelberg (2005)
Chapter Google Scholar
San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. International Journal of Applied Mathematics and Computer Science 14(2), 241–248 (2004)
MATH MathSciNet Google Scholar
Bai, L., Liang, J., Dang, C., et al.: The Impact of Cluster Representatives on the Convergence of the K-Modes Type Clustering (2012)
Google Scholar
Banfield, C.F., Bassill, L.C.: Algorithm AS 113. A transfer algorithm for non-hierarchical classification. Applied Statistics 26, 206–210 (1977)
Article Google Scholar
Tarsitano, A.: A computational study of several relocation methods for k-means algorithms. Pattern Recognition 36(12), 2955–2966 (2003)
Article MATH Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., et al.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 503–507 (2007)
Article Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml
Gabor Melli. The datgen Dataset Generator, http://www.datasetgenerator.com

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, China
Zhengrong Xiang & Lichuan Ji

Authors

Zhengrong Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Lichuan Ji
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, Z., Ji, L. (2013). The Use of Transfer Algorithm for Clustering Categorical Data. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics