Abstract
The paper addresses the problem of assessing the validity of the clusters found by a clustering algorithm. The determination of the “true” number of “natural” clusters has often been considered as the central problem of cluster validation. Many different stopping rules have been proposed in the research literature but most of them are applicable only to classical data (qualitative or quantitative). In this paper we investigate the problem of the determination of the number of clusters for symbolic objects described by interval variables. We consider five classical methods and two hypothesis tests based on the Poisson point process. We extend these methods to interval data. We apply them to the meteorological stations data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BAKER, F.B. and HUBERT, L.J. (1975): Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association, 70, 31–38.
BAUNE, J. (2006): SYKSOM, Méthode de Représentation et de Classification de Données Symboliques basée sur les Cartes de Kohonen. Mémoire, FUNDP-University of Namur, Namur, Belgium.
BEALE, E.M.L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute 43(2), 92–94.
BERTRAND, P. and BEL MUFTI, G. (2006): Loevinger’s measures of rule quality for assessing cluster stability. Computational Statistics and Data Analysis 50, 992–1015.
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Springer, Berlin.
CALINSKI, T. and HARABASZ J. (1974): A dendrite method for cluster analysis. Communications in Statistics 3(2), 1–27.
CELEUX, G., DIDAY, E., GOVAERT, G. and LECHEVALLIER, Y. (1989): Classification Automatique des Données. Bordas.
DIDAY, E. (1972): Nouveaux Concepts et Nouvelles Méthodes en Classification Automatique. Thèse d’Etat, Université Paris VI.
DUDA R.O. and HART, P.E. (1973): Classification and Scene Analysis. Wiley.
GORDON, A.D. (1996): How Many Clusters? An investigation of five procedures for detecting nested cluster structure. In: C. Hayashi et al. (Eds): Data Science, Classification, and Related Methods. Springer, Berlin, 109–116.
HARDY, A. and RASSON, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des Données 23, 41–56.
HARDY, A.(1996): On the number of clusters. Computational Statistics and Data Analysis 23, 83–96.
HARDY, A. and ANDRE, P. (1998): An investigation of nine procedures for detecting the structure in a data set. In: A. Rizzi, M. Vichi and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 29–36.
HARDY, A., LALLEMAND, P. and LECHEVALLIER, Y. (2002): La détermination du nombre de classes pour la méthode de classification symbolique SCLUST. In: Actes des Huitièmes Rencontres de la Socité Francophone de Classification, 27–31.
HARDY, A. (2004): Les méthodes de classification et de détermination du nombre de classes: du classique au symbolique. In: M. Chavent et al. (Eds.): Comptes rendus des 11èmes Rencontres de la Société Francophone de Classification, 48–55.
HARDY, A. (2006): Application of permutation tests to clustering. Technical Report, Department of Mathematics, FUNDP-University of Namur, Namur, Belgium.
HUBER, L.J. and LEVIN, J.R. (1976): A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83, 1076–1080.
KARR, A.F. (1991): Point Processes and their Statistical Inference, Marcel Dekker.
KUBUSHISHI, T. (1996): On some Applications of the Point Process Theory in Cluster Analysis and Pattern Recognition. PhD Thesis, FUNDP-University of Namur, Namur, Belgium.
MILLIGAN, G.W. and COOPER, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179.
NOIRHOMME-FRAITURE, M. and ROUARD, M. (2000): Visualizing and editing symbolic objects. In H.-H. Bock and Diday, E. (Eds.): Analysis of Symbolic Data. Springer, Berlin, 125–138.
PIRÇN, J.Y. (2004): Le clustering et les processus de Poisson pour de nouvelles méthodes monothétiques. PhD. thesis, FUNDP-University of Namur, Namur, Belgium.
RASSON, J.P. and GRANVILLE, V. (1996): Geometrical tools in classification. Computational Statistics and Data Analysis 23, 105–123.
VERDE, R., DE CARVALHO, F. and LECHEVALLIER, Y. (2000): A dynamical clustering algorithm for multi-nominal data. In: H.A.L. Kiers et al. (Eds.): Data Analysis, Classification, and Related Methods. Springer, Heidelberg, 387–393.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hardy, A., Baune, J. (2007). Clustering and Validation of Interval Data. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)