Advertisement

Clustering of Mixed-Type Data Considering Concept Hierarchies

  • Sahar BehzadiEmail author
  • Nikola S. Müller
  • Claudia Plant
  • Christian Böhm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)

Abstract

Most clustering algorithms have been designed only for pure numerical or pure categorical data sets while nowadays many applications generate mixed data. It arises the question how to integrate various types of attributes so that one could efficiently group objects without loss of information. It is already well understood that a simple conversion of categorical attributes into a numerical domain is not sufficient since relationships between values such as a certain order are artificially introduced. Leveraging the natural conceptual hierarchy among categorical information, concept trees summarize the categorical attributes. In this paper we propose the algorithm ClicoT (CLustering mixed-type data Including COncept Trees) which is based on the Minimum Description Length (MDL) principle. Profiting of the conceptual hierarchies, ClicoT integrates categorical and numerical attributes by means of a MDL based objective function. The result of ClicoT is well interpretable since concept trees provide insights of categorical data. Extensive experiments on synthetic and real data set illustrate that ClicoT is noise-robust and yields well interpretable results in a short runtime.

Supplementary material

References

  1. 1.
    Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)CrossRefGoogle Scholar
  2. 2.
    Behzadi, S., Ibrahim, M.A., Plant, C.: Parameter free mixed-type density-based clustering. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R.R. (eds.) DEXA 2018. LNCS, vol. 11030, pp. 19–34. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-98812-2_2CrossRefGoogle Scholar
  3. 3.
    Böhm, C., Faloutsos, C., Pan, J., Plant, C.: Robust information-theoretic clustering. In: KDD (2006)Google Scholar
  4. 4.
    Böhm, C., Goebl, S., Oswald, A., Plant, C., Plavinski, M., Wackersreuther, B.: Integrative parameter-free clustering of data with mixed type attributes. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 38–47. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13657-3_7CrossRefGoogle Scholar
  5. 5.
    He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. CoRR abs/cs/0509011 (2005)Google Scholar
  6. 6.
    Hsu, C.C., Chen, C.L., Su, Y.W.: Hierarchical clustering of mixed data based on distance hierarchy. Inf. Sci. 177(20), 4474–4492 (2007)CrossRefGoogle Scholar
  7. 7.
    Hsu, C.C., Chen, Y.C.: Mining of mixed data with application to catalog marketing. Expert Syst. Appl. 32(1), 12–23 (2007)CrossRefGoogle Scholar
  8. 8.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)CrossRefGoogle Scholar
  9. 9.
    McParland, D., Gormley, I.C.: Model based clustering for mixed data: ClustMD. Adv. Data Anal. Classif. 10(2), 155–169 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Plant, C., Böhm, C.: INCONCO: interpretable clustering of numerical and categorical objects. In: KDD, pp. 1127–1135 (2011)Google Scholar
  11. 11.
    Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–31 (1983)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML (2009)Google Scholar
  13. 13.
    Yin, J., Tan, Z.: Clustering mixed type attributes in large dataset. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds.) ISPA 2005. LNCS, vol. 3758, pp. 655–661. Springer, Heidelberg (2005).  https://doi.org/10.1007/11576235_66CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sahar Behzadi
    • 1
    Email author
  • Nikola S. Müller
    • 2
  • Claudia Plant
    • 1
    • 3
  • Christian Böhm
    • 4
  1. 1.Faculty of Computer Science, Data MiningUniversity of ViennaViennaAustria
  2. 2.Institute of Computational BiologyHelmholtz Zentrum MünchenMunichGermany
  3. 3.ds:UniVieUniversity of ViennaViennaAustria
  4. 4.Ludwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations