Cluster Analysis: An Application to a Real Mixed-Type Data Set

  • G. CarusoEmail author
  • S. A. Gattone
  • A. Balzanella
  • T. Di Battista
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 179)


When you dispose of multivariate data it is crucial to summarize them, so as to extract appropriate and useful information, and consequently, to make proper decisions accordingly. Cluster analysis fully meets this requirement; it groups data into meaningful groups such that both the similarity within a cluster and the dissimilarity between groups are maximized. Thanks to its great usefulness, clustering is used in a broad variety of contexts; this explains its huge appeal in many disciplines. Most of the existing clustering approaches are limited to numerical or categorical data only. However, since data sets composed of mixed types of attributes are very common in real life applications, it is absolutely worth to perform clustering on them. In this paper therefore we stress the importance of this approach, by implementing an application on a real world mixed-type data set.


Clusters analysis Numeric data Categorical data Mixed data Cluster algorithm 


  1. Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)CrossRefGoogle Scholar
  2. Brignell, C.J., Dryden, I.L., Gattone, S.A., Park, B., Browne, W.J.: Surface shape analysis with an application to brain surface asymmetry in schizophrenia. Biostatistics 11(4), 1–22 (2010)CrossRefGoogle Scholar
  3. Caruso, G., Gattone, S.A., Fortuna, F., Di Battista, T.: Cluster analysis as a decision-making tool: a methodological review. In: Bucciarelli, E., Chen, S., Corchado, J.M., (eds.) Decision Economics: In the Tradition of Herbert A. Simon’s Heritage. Advances in Intelligent Systems and Computing, vol. 618, pp. 48–55. Springer International Publishing (2018)Google Scholar
  4. Cheung, Y., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46, 2228–2238 (2013)CrossRefGoogle Scholar
  5. Di Battista, T.: Diversity index estimation by adaptive sampling. Environmetrics 13(2), 209–214 (2002)CrossRefGoogle Scholar
  6. Di Battista, T., Fortuna, F.: Clustering dichotomously scored items through functional data analysis. Electron. J. Appl. Stat. Anal. 9(2), 433–450 (2016)MathSciNetGoogle Scholar
  7. Di Battista, T., Gattone, S.A.: Multivariate bootstrap confidence regions for abundance vector using data depth. Environ. Ecol. Stat. 11(4), 355–365 (2004)MathSciNetCrossRefGoogle Scholar
  8. Di Battista, T., Gattone, S.A.: Nonparametric tests and confidence regions for intrinsic diversity profiles of ecological populations. Environmetrics 14(8), 733–741 (2003)CrossRefGoogle Scholar
  9. Everitt, B.: Cluster Analysis. Heinemann Educational Books Ltd. (1974)Google Scholar
  10. Fortuna, F., Maturo, F.: K-means clustering of item characteristic curves and item information curves via functional principal component analysis. Qual. Quant. (2018). Scholar
  11. Gattone, S.A., De Sanctis, A., Russo, T., Pulcini, D.: A shape distance based on the Fisher-Rao metric and its application for shapes clustering. Phisica A 487, 93–102 (2017)MathSciNetCrossRefGoogle Scholar
  12. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings in the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)Google Scholar
  13. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  14. Maturo, F.: Unsupervised classification of ecological communities ranked according to their biodiversity patterns via a functional principal component decomposition of Hills numbers integral functions. Ecol. Indic. 90, 305–315 (2018)CrossRefGoogle Scholar
  15. Nie, G., Chen, Y., Zhang, L., Guo, Y.: Credit card customer analysis based on panel data clustering. Procedia Comput. Sci. 1(1), 2489–2497 (2010)CrossRefGoogle Scholar
  16. Peng, Y., Kou, G., Shi. Y., Chen, Z.: Improving clustering analysis for credit card accounts classification. In: Proceedings of the 5th International Conference on Computational Science—ICCS 2005, Part III, pp. 548–553. Springer Berlin Heidelberg (2005)Google Scholar
  17. Valentini, P., Di Battista, T., Gattone, S.: Heterogeneneity measures in customer satisfaction analysis. J. Classif. 28, 38–52 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • G. Caruso
    • 1
    Email author
  • S. A. Gattone
    • 1
  • A. Balzanella
    • 2
  • T. Di Battista
    • 1
  1. 1.University G. d’AnnunzioPescaraItaly
  2. 2.University of Campania Luigi VanvitelliCasertaItaly

Personalised recommendations