Clustering Mixed Datasets by Using Similarity Features

  • Amir AhmadEmail author
  • Santosh Kumar Ray
  • Ch. Aswani Kumar
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 39)


Clustering datasets consisting of numeric and nominal features is a challenging task as there are different similarity measures for numeric and nominal features. In the present paper, we propose a method to transform a mixed dataset to a numeric dataset. This method uses a similarity measure for mixed datasets and a randomly selected set of the data objects form the given mixed dataset and generate numeric similarity features. A clustering algorithm for pure numeric datasets is then applied on the newly generated numeric dataset to produce clusters. A comparative study with the other clustering algorithms demonstrated the superior performance of the proposed clustering approach.


  1. 1.
    Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)CrossRefGoogle Scholar
  2. 2.
    Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32(7), 1062–1069 (2011)CrossRefGoogle Scholar
  3. 3.
    Ahmad, A., Hashmi, S.: K-harmonic means type clustering algorithm for mixed datasets. Appl. Soft Comput. 48(C), 39–49 (2016)CrossRefGoogle Scholar
  4. 4.
    Ahmad, A., Khan, S.S.: Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7, 31883–31902 (2019)CrossRefGoogle Scholar
  5. 5.
    Balcan, M.F., Blum, A.: On a theory of learning with similarity functions. In: Proceedings of the 23rd International Conference on Machine Learning (2006)Google Scholar
  6. 6.
    Balcan, M.F., Blum, A., Vempala, S.: Kernels as features: on kernels, margins, and low-dimensional mappings. Mach. Learn. 65, 79–94 (2006)CrossRefGoogle Scholar
  7. 7.
    Barcelo-Rico, F., Jose-Luis, D.: Geometrical codification for clustering mixed categorical and numerical databases. J. Intell. Inf. Syst. 39(1), 167–185 (2012)CrossRefGoogle Scholar
  8. 8.
    Carpenter, G.A., Grossberg, S., Rosen, D.B.: Fuzzy art: fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Netw. 4(6), 759–771 (1991)CrossRefGoogle Scholar
  9. 9.
    Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46(8), 2228–2238 (2013) CrossRefGoogle Scholar
  10. 10.
    Foss, A.H., Markatou, M., Ray, B.: Distance metrics and clustering methods for mixed-type data. Int. Stat. Rev. 87(1), 80–109 (2018)MathSciNetCrossRefGoogle Scholar
  11. 11.
    He, Z.: Farthest-point heuristic based initialization methods for k-modes clustering. CoRR, abs/cs/0610043 (2006)Google Scholar
  12. 12.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, pp. 21–34. World Scientific, Singapore (1997)Google Scholar
  13. 13.
    Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: In Research Issues on Data Mining and Knowledge Discovery, pp. 1–8 (1997)Google Scholar
  14. 14.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Upper Saddle River (1988)zbMATHGoogle Scholar
  15. 15.
    Ji, J., Pang, W., Zheng, Y., Wang, Z., Ma, Z., Zhang, L.: A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance. Appl. Math. Inf. Sci. 9(6), 2933 (2015)Google Scholar
  16. 16.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)CrossRefGoogle Scholar
  17. 17.
    Lam, D., Wei, M., Wunsch, D.: Clustering data of mixed categorical and numerical type with unsupervised feature learning. IEEE Access 3, 1605–1613 (2015)CrossRefGoogle Scholar
  18. 18.
    Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(4), 673–690 (2002)CrossRefGoogle Scholar
  19. 19.
    Lin, S., Azarnoush, B., Runger, G.: CRAFTER: a tree-ensemble clustering algorithm for static datasets with mixed attributes and high dimensionality. IEEE Trans. Knowl. Data Eng. (in Press)Google Scholar
  20. 20.
    Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332(C), 167–183 (2016)CrossRefGoogle Scholar
  21. 21.
    Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003)CrossRefGoogle Scholar
  22. 22.
    Wang, C., Chi, C., Zhou, W., Wong, R.: Coupled interdependent attribute analysis on mixed data. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 1861–1867 (2015)Google Scholar
  23. 23.
    Wei, M., Chow, T.W.S., Chan, R.H.M.: Clustering heterogeneous data with k-means by mutual information-based unsupervised feature transformation. Entropy 17(3), 1535–1548 (2015)CrossRefGoogle Scholar
  24. 24.
    Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg, pp. 972–980. Springer, Heidelberg (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Amir Ahmad
    • 1
    Email author
  • Santosh Kumar Ray
    • 2
  • Ch. Aswani Kumar
    • 3
  1. 1.College of Information TechnologyUnited Arab Emirates UniversityAl AinUAE
  2. 2.Department of Information TechnologyKhawarizmi International CollegeAl AinUAE
  3. 3.School of Information Technology and EngineeringVIT UniversityVelloreIndia

Personalised recommendations