Cluster Analysis: An Application to a Real Mixed-Type Data Set

Caruso, G.; Gattone, S. A.; Balzanella, A.; Di Battista, T.

doi:10.1007/978-3-030-00084-4_27

Cluster Analysis: An Application to a Real Mixed-Type Data Set

G. Caruso⁵,
S. A. Gattone⁵,
A. Balzanella⁶ &
…
T. Di Battista⁵

Chapter
First Online: 13 October 2018

543 Accesses
7 Citations
1 Altmetric

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 179))

Abstract

When you dispose of multivariate data it is crucial to summarize them, so as to extract appropriate and useful information, and consequently, to make proper decisions accordingly. Cluster analysis fully meets this requirement; it groups data into meaningful groups such that both the similarity within a cluster and the dissimilarity between groups are maximized. Thanks to its great usefulness, clustering is used in a broad variety of contexts; this explains its huge appeal in many disciplines. Most of the existing clustering approaches are limited to numerical or categorical data only. However, since data sets composed of mixed types of attributes are very common in real life applications, it is absolutely worth to perform clustering on them. In this paper therefore we stress the importance of this approach, by implementing an application on a real world mixed-type data set.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)
Article Google Scholar
Brignell, C.J., Dryden, I.L., Gattone, S.A., Park, B., Browne, W.J.: Surface shape analysis with an application to brain surface asymmetry in schizophrenia. Biostatistics 11(4), 1–22 (2010)
Article Google Scholar
Caruso, G., Gattone, S.A., Fortuna, F., Di Battista, T.: Cluster analysis as a decision-making tool: a methodological review. In: Bucciarelli, E., Chen, S., Corchado, J.M., (eds.) Decision Economics: In the Tradition of Herbert A. Simon’s Heritage. Advances in Intelligent Systems and Computing, vol. 618, pp. 48–55. Springer International Publishing (2018)
Google Scholar
Cheung, Y., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46, 2228–2238 (2013)
Article Google Scholar
Di Battista, T.: Diversity index estimation by adaptive sampling. Environmetrics 13(2), 209–214 (2002)
Article Google Scholar
Di Battista, T., Fortuna, F.: Clustering dichotomously scored items through functional data analysis. Electron. J. Appl. Stat. Anal. 9(2), 433–450 (2016)
MathSciNet Google Scholar
Di Battista, T., Gattone, S.A.: Multivariate bootstrap confidence regions for abundance vector using data depth. Environ. Ecol. Stat. 11(4), 355–365 (2004)
Article MathSciNet Google Scholar
Di Battista, T., Gattone, S.A.: Nonparametric tests and confidence regions for intrinsic diversity profiles of ecological populations. Environmetrics 14(8), 733–741 (2003)
Article Google Scholar
Everitt, B.: Cluster Analysis. Heinemann Educational Books Ltd. (1974)
Google Scholar
Fortuna, F., Maturo, F.: K-means clustering of item characteristic curves and item information curves via functional principal component analysis. Qual. Quant. (2018). https://doi.org/10.1007/s11135-018-0724-7
Article Google Scholar
Gattone, S.A., De Sanctis, A., Russo, T., Pulcini, D.: A shape distance based on the Fisher-Rao metric and its application for shapes clustering. Phisica A 487, 93–102 (2017)
Article MathSciNet Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings in the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Maturo, F.: Unsupervised classification of ecological communities ranked according to their biodiversity patterns via a functional principal component decomposition of Hills numbers integral functions. Ecol. Indic. 90, 305–315 (2018)
Article Google Scholar
Nie, G., Chen, Y., Zhang, L., Guo, Y.: Credit card customer analysis based on panel data clustering. Procedia Comput. Sci. 1(1), 2489–2497 (2010)
Article Google Scholar
Peng, Y., Kou, G., Shi. Y., Chen, Z.: Improving clustering analysis for credit card accounts classification. In: Proceedings of the 5th International Conference on Computational Science—ICCS 2005, Part III, pp. 548–553. Springer Berlin Heidelberg (2005)
Google Scholar
Valentini, P., Di Battista, T., Gattone, S.: Heterogeneneity measures in customer satisfaction analysis. J. Classif. 28, 38–52 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University G. d’Annunzio, Pescara, Italy
G. Caruso, S. A. Gattone & T. Di Battista
University of Campania Luigi Vanvitelli, Caserta, Italy
A. Balzanella

Authors

G. Caruso
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Gattone
View author publications
You can also search for this author in PubMed Google Scholar
A. Balzanella
View author publications
You can also search for this author in PubMed Google Scholar
T. Di Battista
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Caruso .

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Ovidius University of Constanţa, Constanţa, Romania
Cristina Flaut
Department of Mathematics and Physics, Faculty of Military Technology, University of Defence, Brno, Czech Republic
Šárka Hošková-Mayerová
Faculty of History and Political Science, Ovidius University of Constanţa, Constanţa, Romania
Daniel Flaut

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caruso, G., Gattone, S.A., Balzanella, A., Di Battista, T. (2019). Cluster Analysis: An Application to a Real Mixed-Type Data Set. In: Flaut, C., Hošková-Mayerová, Š., Flaut, D. (eds) Models and Theories in Social Systems. Studies in Systems, Decision and Control, vol 179. Springer, Cham. https://doi.org/10.1007/978-3-030-00084-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-00084-4_27
Published: 13 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00083-7
Online ISBN: 978-3-030-00084-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics