Skip to main content

Abstract

The paper addresses the problem of assessing the validity of the clusters found by a clustering algorithm. The determination of the “true” number of “natural” clusters has often been considered as the central problem of cluster validation. Many different stopping rules have been proposed in the research literature but most of them are applicable only to classical data (qualitative or quantitative). In this paper we investigate the problem of the determination of the number of clusters for symbolic objects described by interval variables. We consider five classical methods and two hypothesis tests based on the Poisson point process. We extend these methods to interval data. We apply them to the meteorological stations data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BAKER, F.B. and HUBERT, L.J. (1975): Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association, 70, 31–38.

    Article  MATH  Google Scholar 

  • BAUNE, J. (2006): SYKSOM, Méthode de Représentation et de Classification de Données Symboliques basée sur les Cartes de Kohonen. Mémoire, FUNDP-University of Namur, Namur, Belgium.

    Google Scholar 

  • BEALE, E.M.L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute 43(2), 92–94.

    Google Scholar 

  • BERTRAND, P. and BEL MUFTI, G. (2006): Loevinger’s measures of rule quality for assessing cluster stability. Computational Statistics and Data Analysis 50, 992–1015.

    Article  MathSciNet  Google Scholar 

  • BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Springer, Berlin.

    Google Scholar 

  • CALINSKI, T. and HARABASZ J. (1974): A dendrite method for cluster analysis. Communications in Statistics 3(2), 1–27.

    Article  MathSciNet  Google Scholar 

  • CELEUX, G., DIDAY, E., GOVAERT, G. and LECHEVALLIER, Y. (1989): Classification Automatique des Données. Bordas.

    Google Scholar 

  • DIDAY, E. (1972): Nouveaux Concepts et Nouvelles Méthodes en Classification Automatique. Thèse d’Etat, Université Paris VI.

    Google Scholar 

  • DUDA R.O. and HART, P.E. (1973): Classification and Scene Analysis. Wiley.

    Google Scholar 

  • GORDON, A.D. (1996): How Many Clusters? An investigation of five procedures for detecting nested cluster structure. In: C. Hayashi et al. (Eds): Data Science, Classification, and Related Methods. Springer, Berlin, 109–116.

    Google Scholar 

  • HARDY, A. and RASSON, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des Données 23, 41–56.

    MathSciNet  Google Scholar 

  • HARDY, A.(1996): On the number of clusters. Computational Statistics and Data Analysis 23, 83–96.

    Google Scholar 

  • HARDY, A. and ANDRE, P. (1998): An investigation of nine procedures for detecting the structure in a data set. In: A. Rizzi, M. Vichi and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 29–36.

    Google Scholar 

  • HARDY, A., LALLEMAND, P. and LECHEVALLIER, Y. (2002): La détermination du nombre de classes pour la méthode de classification symbolique SCLUST. In: Actes des Huitièmes Rencontres de la Socité Francophone de Classification, 27–31.

    Google Scholar 

  • HARDY, A. (2004): Les méthodes de classification et de détermination du nombre de classes: du classique au symbolique. In: M. Chavent et al. (Eds.): Comptes rendus des 11èmes Rencontres de la Société Francophone de Classification, 48–55.

    Google Scholar 

  • HARDY, A. (2006): Application of permutation tests to clustering. Technical Report, Department of Mathematics, FUNDP-University of Namur, Namur, Belgium.

    Google Scholar 

  • HUBER, L.J. and LEVIN, J.R. (1976): A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83, 1076–1080.

    Google Scholar 

  • KARR, A.F. (1991): Point Processes and their Statistical Inference, Marcel Dekker.

    Google Scholar 

  • KUBUSHISHI, T. (1996): On some Applications of the Point Process Theory in Cluster Analysis and Pattern Recognition. PhD Thesis, FUNDP-University of Namur, Namur, Belgium.

    Google Scholar 

  • MILLIGAN, G.W. and COOPER, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179.

    Article  Google Scholar 

  • NOIRHOMME-FRAITURE, M. and ROUARD, M. (2000): Visualizing and editing symbolic objects. In H.-H. Bock and Diday, E. (Eds.): Analysis of Symbolic Data. Springer, Berlin, 125–138.

    Google Scholar 

  • PIRÇN, J.Y. (2004): Le clustering et les processus de Poisson pour de nouvelles méthodes monothétiques. PhD. thesis, FUNDP-University of Namur, Namur, Belgium.

    Google Scholar 

  • RASSON, J.P. and GRANVILLE, V. (1996): Geometrical tools in classification. Computational Statistics and Data Analysis 23, 105–123.

    Article  MATH  MathSciNet  Google Scholar 

  • VERDE, R., DE CARVALHO, F. and LECHEVALLIER, Y. (2000): A dynamical clustering algorithm for multi-nominal data. In: H.A.L. Kiers et al. (Eds.): Data Analysis, Classification, and Related Methods. Springer, Heidelberg, 387–393.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hardy, A., Baune, J. (2007). Clustering and Validation of Interval Data. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_7

Download citation

Publish with us

Policies and ethics