Clustering and Validation of Interval Data

Hardy, André; Baune, Joffray

doi:10.1007/978-3-540-73560-1_7

André Hardy²³ &
Joffray Baune²³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2677 Accesses
2 Citations

Abstract

The paper addresses the problem of assessing the validity of the clusters found by a clustering algorithm. The determination of the “true” number of “natural” clusters has often been considered as the central problem of cluster validation. Many different stopping rules have been proposed in the research literature but most of them are applicable only to classical data (qualitative or quantitative). In this paper we investigate the problem of the determination of the number of clusters for symbolic objects described by interval variables. We consider five classical methods and two hypothesis tests based on the Poisson point process. We extend these methods to interval data. We apply them to the meteorological stations data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BAKER, F.B. and HUBERT, L.J. (1975): Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association, 70, 31–38.
Article MATH Google Scholar
BAUNE, J. (2006): SYKSOM, Méthode de Représentation et de Classification de Données Symboliques basée sur les Cartes de Kohonen. Mémoire, FUNDP-University of Namur, Namur, Belgium.
Google Scholar
BEALE, E.M.L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute 43(2), 92–94.
Google Scholar
BERTRAND, P. and BEL MUFTI, G. (2006): Loevinger’s measures of rule quality for assessing cluster stability. Computational Statistics and Data Analysis 50, 992–1015.
Article MathSciNet Google Scholar
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Springer, Berlin.
Google Scholar
CALINSKI, T. and HARABASZ J. (1974): A dendrite method for cluster analysis. Communications in Statistics 3(2), 1–27.
Article MathSciNet Google Scholar
CELEUX, G., DIDAY, E., GOVAERT, G. and LECHEVALLIER, Y. (1989): Classification Automatique des Données. Bordas.
Google Scholar
DIDAY, E. (1972): Nouveaux Concepts et Nouvelles Méthodes en Classification Automatique. Thèse d’Etat, Université Paris VI.
Google Scholar
DUDA R.O. and HART, P.E. (1973): Classification and Scene Analysis. Wiley.
Google Scholar
GORDON, A.D. (1996): How Many Clusters? An investigation of five procedures for detecting nested cluster structure. In: C. Hayashi et al. (Eds): Data Science, Classification, and Related Methods. Springer, Berlin, 109–116.
Google Scholar
HARDY, A. and RASSON, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des Données 23, 41–56.
MathSciNet Google Scholar
HARDY, A.(1996): On the number of clusters. Computational Statistics and Data Analysis 23, 83–96.
Google Scholar
HARDY, A. and ANDRE, P. (1998): An investigation of nine procedures for detecting the structure in a data set. In: A. Rizzi, M. Vichi and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 29–36.
Google Scholar
HARDY, A., LALLEMAND, P. and LECHEVALLIER, Y. (2002): La détermination du nombre de classes pour la méthode de classification symbolique SCLUST. In: Actes des Huitièmes Rencontres de la Socité Francophone de Classification, 27–31.
Google Scholar
HARDY, A. (2004): Les méthodes de classification et de détermination du nombre de classes: du classique au symbolique. In: M. Chavent et al. (Eds.): Comptes rendus des 11èmes Rencontres de la Société Francophone de Classification, 48–55.
Google Scholar
HARDY, A. (2006): Application of permutation tests to clustering. Technical Report, Department of Mathematics, FUNDP-University of Namur, Namur, Belgium.
Google Scholar
HUBER, L.J. and LEVIN, J.R. (1976): A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83, 1076–1080.
Google Scholar
KARR, A.F. (1991): Point Processes and their Statistical Inference, Marcel Dekker.
Google Scholar
KUBUSHISHI, T. (1996): On some Applications of the Point Process Theory in Cluster Analysis and Pattern Recognition. PhD Thesis, FUNDP-University of Namur, Namur, Belgium.
Google Scholar
MILLIGAN, G.W. and COOPER, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179.
Article Google Scholar
NOIRHOMME-FRAITURE, M. and ROUARD, M. (2000): Visualizing and editing symbolic objects. In H.-H. Bock and Diday, E. (Eds.): Analysis of Symbolic Data. Springer, Berlin, 125–138.
Google Scholar
PIRÇN, J.Y. (2004): Le clustering et les processus de Poisson pour de nouvelles méthodes monothétiques. PhD. thesis, FUNDP-University of Namur, Namur, Belgium.
Google Scholar
RASSON, J.P. and GRANVILLE, V. (1996): Geometrical tools in classification. Computational Statistics and Data Analysis 23, 105–123.
Article MATH MathSciNet Google Scholar
VERDE, R., DE CARVALHO, F. and LECHEVALLIER, Y. (2000): A dynamical clustering algorithm for multi-nominal data. In: H.A.L. Kiers et al. (Eds.): Data Analysis, Classification, and Related Methods. Springer, Heidelberg, 387–393.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Namur, 8 Rempart de la Vierge, 5000, Namur, Belgium
André Hardy & Joffray Baune

Authors

André Hardy
View author publications
You can also search for this author in PubMed Google Scholar
Joffray Baune
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464, Porto, Portugal
Paula Brito
ESG UQAM, 315 East, Sainte-Catherine Street, Montreal, Quebec, H2X 3X2, Canada
Guy Cucumel
Department Lussi, ENST Bretagne, 2 rue de la Châtaigneraie, CS 17607, 35576, Cesson-Sévigné Cedex, France
Patrice Bertrand
Centre of Computer Science (CIn), Federal University of Pernambuco (UFPE), Av. Prof. Luiz Freire s/n Cidade Universitária, CEP 50740-540, Recife-PE, Brazil
Francisco de Carvalho

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hardy, A., Baune, J. (2007). Clustering and Validation of Interval Data. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-73560-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics