Finding Groups in Ordinal Data: An Examination of Some Clustering Procedures

Walesiak, Marek; Dudek, Andrzej

doi:10.1007/978-3-642-10745-0_19

Marek Walesiak³ &
Andrzej Dudek

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2308 Accesses
7 Citations

Abstract

The article evaluates, based on ordinal data simulated with cluster.Gen function of clusterSim package working in R environment, some cluster analysis procedures containing GDM distance for ordinal data (see Jajuga et al. 2003; Walesiak 1993, 2006), nine clustering methods and eight internal cluster quality indices for determining the number of clusters. Seventy two clustering procedures are evaluated based on simulated data originating from a variety of models. Models contain the known structure of clusters and differ in the number of true dimensions, the number of categories for each variable, the density and shape of clusters, the number of true clusters, the number of noisy variables. Each clustering result was compared with the known cluster structure from models applying (Hubert and Arabie 1985) corrected Rand index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderberg, M. R. (1973). Cluster analysis for applications. New York, San Francisco, London: Academic Press.
MATH Google Scholar
Gordon, A. D. (1999). Classification. London: Chapman & Hall/CRC.
MATH Google Scholar
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
Jajuga, K., & Walesiak, M. (2000). Standardisation of data set under different measurement scales. In R. Decker & W. Gaul (Eds.), Classification and information processing at the turn of the millennium (pp. 105–112). Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Jajuga, K., Walesiak, M., & Ba̧k, A. (2003). On the general distance measure. In M. Schwaiger & O. Opitz (Eds.), Exploratory data analysis in empirical research (pp. 104–109). Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley (second editon: 2005).
Google Scholar
Kendall, M. G. (1966). Discrimination and classification. In P. R. Krishnaiah (Ed.), Multivariate analysis I (pp. 165–185). New York: Academic Press.
Google Scholar
Macnaughton-Smith, P., Williams, W. T., Dale, M. B., & Mockett, L. G. (1964). Dissimilarity analysis: A new technique of hierarchical sub-division. Nature, 202, 1034–1035.
Article Google Scholar
Milligan, G. W. (1985). An algorithm for generating artificial test clusters. Psychometrika, 50(1), 123–127.
Article Google Scholar
Milligan, G. W. (1996). Clustering validation: results and implications for applied analyses. In P. Arabie, L. J. Hubert & G. de Soete (Eds.), Clustering and classification (pp. 341–375). Singapore: World Scientific.
Google Scholar
Milligan, G. W., & Cooper, M. C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181–204.
Article MathSciNet Google Scholar
Podani, J. (1999). Extending Gower’s general coefficient of similarity to ordinal characters. Taxon, 48, 331–340.
Article Google Scholar
Qiu, W., & Joe, H. (2006). Generation of random clusters with specified degree of separation. Journal of Classification, 23(2), 315–334.
Article MathSciNet Google Scholar
Soffritti, G. (2003). Identifying multiple cluster structures in a data matrix. Communications in Statistics. Simulation and Computation, 32(4), 1151–1177.
Article MATH MathSciNet Google Scholar
Stevens, S. S. (1959). Measurement, psychophysics and utility. In C. W. Churchman and P. Ratooch (Eds.), Measurement. Definitions and theories (pp. 18–63), New York: Wiley.
Google Scholar
Tibshirani, R., & Walther, G. (2005). Cluster validation by predicting strength. Journal of Computational and Graphical Statistics, 14(3), 511–528.
Article MathSciNet Google Scholar
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society, ser. B, 63(2), 411–423.
Google Scholar
Walesiak, M. (1993). Statystyczna analiza wielowymiarowa w badaniach marketingowych [Multivariate Statistical analysis in marketing research]. Wrocław University of Economics, Research Papers no. 654.
Google Scholar
Walesiak, M. (2006). Uogólniona miara odległości w statystycznej analizie wielowymiarowej [The generalised distance measure in multivariate statistical analysis]. Wrocław: Wydawnictwo AE.
Google Scholar
Walesiak, M., & Dudek, A. (2009). clusterSim package, http://www.R-project.org/.

Download references

Author information

Authors and Affiliations

Wrocław University of Economics, Nowowiejska 3, 58-500, Jelenia Góra, Poland
Marek Walesiak

Authors

Marek Walesiak
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Dudek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Walesiak .

Editor information

Editors and Affiliations

LS für BWL, insb. Finanzwirtschaft und, Finanzdienstleistungen, TU Dresden, Helmholtzstr. 10, Dresden, 01062, Germany
Hermann Locarek-Junge
FG Computergestützte Statistik, Univ. Dortmund, Vogelpothsweg 87, Dortmund, 44227, Germany
Claus Weihs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walesiak, M., Dudek, A. (2010). Finding Groups in Ordinal Data: An Examination of Some Clustering Procedures. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-10745-0_19
Published: 03 May 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics