Skip to main content

The Use of Transfer Algorithm for Clustering Categorical Data

  • Conference paper
Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

Abstract

We propose a new method for clustering categorical data. Clustering algorithms need to be designed specifically for categorical data because it has a different nature from numerical data. Here our focus is on the partition paradigm of algorithms. One existing approach is to transform categorical data into binary data and then use k-means. However it’s computationally inefficient. Another approach is k-modes, which extends k-means by replacing means with modes. In our work, we show that the center-based objective function of k-modes can not produce accurate clustering results. Instead, we propose an objective function that is generalized from the k-means objective, but not based on centers. We show that it’s more effective than the center-based objective and demonstrate it with real-life datasets. We also find that by using a particular algorithm called transfer algorithm, the proposed objective function can be efficiently solved. Thus our method is both efficient and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS–clustering categorical data using summaries. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–83. ACM (1999)

    Google Scholar 

  3. Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. Databases, 1 (1998)

    Google Scholar 

  4. Barbar, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589. ACM (2002)

    Google Scholar 

  5. Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: Scalable clustering of categorical data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  7. Steinley, D.: K–means clustering: A half–century synthesis. British Journal of Mathematical and Statistical Psychology 59(1), 1–34 (2006)

    Article  MathSciNet  Google Scholar 

  8. Ralambondrainy, H.: A conceptual version of the K-means algorithm. Pattern Recognition Letters 16(11), 1147–1157 (1995)

    Article  Google Scholar 

  9. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. Red 30(2), 3 (2008)

    Google Scholar 

  10. Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)

    Article  Google Scholar 

  11. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)

    Article  Google Scholar 

  12. He, Z., Deng, S., Xu, X.: Improving K-modes algorithm considering frequencies of attribute values in mode. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-M., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005, Part I. LNCS (LNAI), vol. 3801, pp. 157–162. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. International Journal of Applied Mathematics and Computer Science 14(2), 241–248 (2004)

    MATH  MathSciNet  Google Scholar 

  14. Bai, L., Liang, J., Dang, C., et al.: The Impact of Cluster Representatives on the Convergence of the K-Modes Type Clustering (2012)

    Google Scholar 

  15. Banfield, C.F., Bassill, L.C.: Algorithm AS 113. A transfer algorithm for non-hierarchical classification. Applied Statistics 26, 206–210 (1977)

    Article  Google Scholar 

  16. Tarsitano, A.: A computational study of several relocation methods for k-means algorithms. Pattern Recognition 36(12), 2955–2966 (2003)

    Article  MATH  Google Scholar 

  17. Ng, M.K., Li, M.J., Huang, J.Z., et al.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 503–507 (2007)

    Article  Google Scholar 

  18. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml

  19. Gabor Melli. The datgen Dataset Generator, http://www.datasetgenerator.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xiang, Z., Ji, L. (2013). The Use of Transfer Algorithm for Clustering Categorical Data. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics