Abstract
The article evaluates, based on ordinal data simulated with cluster.Gen function of clusterSim package working in R environment, some cluster analysis procedures containing GDM distance for ordinal data (see Jajuga et al. 2003; Walesiak 1993, 2006), nine clustering methods and eight internal cluster quality indices for determining the number of clusters. Seventy two clustering procedures are evaluated based on simulated data originating from a variety of models. Models contain the known structure of clusters and differ in the number of true dimensions, the number of categories for each variable, the density and shape of clusters, the number of true clusters, the number of noisy variables. Each clustering result was compared with the known cluster structure from models applying (Hubert and Arabie 1985) corrected Rand index.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderberg, M. R. (1973). Cluster analysis for applications. New York, San Francisco, London: Academic Press.
Gordon, A. D. (1999). Classification. London: Chapman & Hall/CRC.
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Jajuga, K., & Walesiak, M. (2000). Standardisation of data set under different measurement scales. In R. Decker & W. Gaul (Eds.), Classification and information processing at the turn of the millennium (pp. 105–112). Berlin, Heidelberg: Springer-Verlag.
Jajuga, K., Walesiak, M., & Ba̧k, A. (2003). On the general distance measure. In M. Schwaiger & O. Opitz (Eds.), Exploratory data analysis in empirical research (pp. 104–109). Berlin, Heidelberg: Springer-Verlag.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley (second editon: 2005).
Kendall, M. G. (1966). Discrimination and classification. In P. R. Krishnaiah (Ed.), Multivariate analysis I (pp. 165–185). New York: Academic Press.
Macnaughton-Smith, P., Williams, W. T., Dale, M. B., & Mockett, L. G. (1964). Dissimilarity analysis: A new technique of hierarchical sub-division. Nature, 202, 1034–1035.
Milligan, G. W. (1985). An algorithm for generating artificial test clusters. Psychometrika, 50(1), 123–127.
Milligan, G. W. (1996). Clustering validation: results and implications for applied analyses. In P. Arabie, L. J. Hubert & G. de Soete (Eds.), Clustering and classification (pp. 341–375). Singapore: World Scientific.
Milligan, G. W., & Cooper, M. C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181–204.
Podani, J. (1999). Extending Gower’s general coefficient of similarity to ordinal characters. Taxon, 48, 331–340.
Qiu, W., & Joe, H. (2006). Generation of random clusters with specified degree of separation. Journal of Classification, 23(2), 315–334.
Soffritti, G. (2003). Identifying multiple cluster structures in a data matrix. Communications in Statistics. Simulation and Computation, 32(4), 1151–1177.
Stevens, S. S. (1959). Measurement, psychophysics and utility. In C. W. Churchman and P. Ratooch (Eds.), Measurement. Definitions and theories (pp. 18–63), New York: Wiley.
Tibshirani, R., & Walther, G. (2005). Cluster validation by predicting strength. Journal of Computational and Graphical Statistics, 14(3), 511–528.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society, ser. B, 63(2), 411–423.
Walesiak, M. (1993). Statystyczna analiza wielowymiarowa w badaniach marketingowych [Multivariate Statistical analysis in marketing research]. Wrocław University of Economics, Research Papers no. 654.
Walesiak, M. (2006). Uogólniona miara odległości w statystycznej analizie wielowymiarowej [The generalised distance measure in multivariate statistical analysis]. Wrocław: Wydawnictwo AE.
Walesiak, M., & Dudek, A. (2009). clusterSim package, http://www.R-project.org/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Walesiak, M., Dudek, A. (2010). Finding Groups in Ordinal Data: An Examination of Some Clustering Procedures. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-10745-0_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)