Skip to main content

Categorical Data Clustering Using the Combinations of Attribute Values

  • Conference paper
Computational Science and Its Applications – ICCSA 2008 (ICCSA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5073))

Included in the following conference series:

Abstract

Clustering is an important technique for exploratory data analysis. While most of the earlier clustering algorithms focused on numerical data, real-world problems and data mining applications frequently involve categorical data. Here, we propose a new clustering algorithm for categorical data that is based on the frequency of attribute value combinations. Our algorithm finds all the combinations of attribute values in a record, which represent a subset of all the attribute values, and then groups the records using the frequency of these combinations. As our algorithm considers all the subsets of attribute values in a record, records in a cluster have not only similar attribute value sets but also strongly associated attribute values. We evaluated our algorithm with real and synthetic data sets, and the experimental results demonstrate the effectiveness of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, Z.: A Fast Clustering Algorithm to Cluster Very large Categorical Data Sets in Data Mining. In: Proceedings of ACM SIGMOD Workshop on Research Issues on data Mining and knowledge Discovery (1997)

    Google Scholar 

  2. Gibson, D., Kleinberg, J., Raghavan, P.: Clustering Categorical Data: An Approach based on Dynamical. In: Proceedings of the 24th International Conference on Very Large Databases (1998)

    Google Scholar 

  3. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–83 (1999)

    Google Scholar 

  4. Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the 15th International Conference on Data Engineering (1999)

    Google Scholar 

  5. Barbara, D., Couto, J., Li, Y.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, pp. 590–599 (2002)

    Google Scholar 

  6. Yun, C.H., Chuang, K.T., Chen, M.S.: Adherence clustering: an efficient method for mining market-basket clusters. Information Systems 31, 170–186 (2006)

    Article  Google Scholar 

  7. Hsu, C.C., Chen, Y.C.: Mining of Mixed data with application to catalog marketing. Expert Systems with Applications (2006)

    Google Scholar 

  8. Kim, M., Ramarkrishna, R.S.: Projected clustering for categorical datasets. Pattern Recognition Letters 27, 1405–1417 (2006)

    Article  Google Scholar 

  9. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Journal of Intelligent Information Systems (2001)

    Google Scholar 

  10. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  11. UCI machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html

  12. Dataset Generator (DatGen), http://www.datasetgenerator.com

  13. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering algorithms and validity measures. In: Thirteenth international conference on scientific and statistical database management, pp. 3–22 (2001)

    Google Scholar 

  14. Chen, H.L., Chuang, K.T., Chen, M.S.: Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 106–113 (2005)

    Google Scholar 

  15. Mirkin, B.: Reinterpreting the Category Utility Function. Machine Learning, 1–11 (2001)

    Google Scholar 

  16. Gluck, A., Corter, J.: Information, Uncertainty, and the utility of categories. In: Proceedings of the Seventh Annual Conference of the Cognitive Science society (1985)

    Google Scholar 

  17. Ordonez, C., Omiecinski, E.: Efficient disk-based K-means clustering for relational database. IEEE Transactions on Knowledge and Data Engineering 16(8), 909–921 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Do, HJ., Kim, JY. (2008). Categorical Data Clustering Using the Combinations of Attribute Values. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69848-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69840-1

  • Online ISBN: 978-3-540-69848-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics