Advertisement

Determination of the Number of Clusters for Symbolic Objects Described by Interval Variables

  • André Hardy
  • Pascale Lallemand
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

One of the important problems in cluster analysis is the objective assessment of the validity of the clusters found by a clustering algorithm. The problem of the determination of the “best ”number of clusters has often been called the central problem of cluster validation. Numerous methods for the determination of the number of clusters have been proposed, but most of them are applicable only to classical data (qualitative, quantitative). In this paper we investigate the problem of the determination of the number of clusters for symbolic objects described by interval variables. We define a notion of convex hull for a set of symbolic objects of interval type. We obtain classical quantitative data, and consequently the classical rules for the determination of the number of clusters can be used. We consider the Hypervolumes test and the best stopping rules from the Milligan and Cooper (1985) study.

Two symbolic clustering methods are also used: Scluster (Bock et al. (2001)), a dynamic partitioning procedure, and a monothetic divisive clustering algorithm (Chavent (1997)). Two data sets illustrate the methods. The first one is an artificially generated data set. The second one is a real data set.

Keywords

Interval Variable Symbolic Data Beef Tallow Homogeneous Poisson Process Classical Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BAKER, F. B. and HUBERT, L. J. (1976): Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association 70, 31–38.CrossRefGoogle Scholar
  2. BEALE, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute 43 (2), 92–94.Google Scholar
  3. BOCK, H.-H. and DIDAY, E. (eds) (2000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data. Springer Verlag.Google Scholar
  4. BOCK, H.-H. et al. (2001): Report of the Meeting ASSO — WP6. 2 Classification group (Munich). Technical report.Google Scholar
  5. CALINSKI, T. and HARABASZ, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.MathSciNetzbMATHGoogle Scholar
  6. CHAVENT, M. (1997): Analyse des données symboliques — Une méthode divisive de classification, Thèse, Université Paris Dauphine.Google Scholar
  7. CHAVENT, M. (2000): Criterion-Based Divisive Clustering for Symbolic data. In: Analysis of Symbolic Data, H.-H. Bock, E. Diday (eds.): Analysis of Symbolic Data. Springer Verlag, 299–311.Google Scholar
  8. CHOUAKRIA, A., CAZES, P., and DIDAY, E. (2000): Symbolic Principal Component Analysis. In: H.-H. Bock, E. Diday (eds.): Analysis of Symbolic Data. Springer Verlag, 200–212.Google Scholar
  9. DIDAY, E. (1971): La méthode des Nuées Dynamiques. Revue de Statistique Appliquée, 19, 2, 19–34.Google Scholar
  10. DUDA, R. O. and HART, P. E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.zbMATHGoogle Scholar
  11. GOWDA, K. C. and DIDAY, E. (1994): Symbolic clustering algorithms using similarity and dissimilarity measures. In: E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, B. Burtschy (eds): New approaches in classification and data analysis. Springer, Berlin, 414–422.CrossRefGoogle Scholar
  12. HARDY, A., and RASSON, J.-P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des Données, 7, 41–56.MathSciNetzbMATHGoogle Scholar
  13. HARDY, A. (1994): An examination of procedures for determining the number of clusters in a data set. In: E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, B. Burtschy (eds): New approaches in classification and data analysis. Springer, Berlin, 178–185.CrossRefGoogle Scholar
  14. HARDY, A. (1996): On the number of clusters. Computational Statistics & Data Analysis, 23, 83–96.zbMATHCrossRefGoogle Scholar
  15. HARDY, A., and DESCHAMPS, J.F. (1999): Apport du critère des Hypervolumes à la validation en classification. In: F. Le Ber, J.-F. Mari, A. Napoli, A. Simon (eds.): Actes des Septièmes Rencontres de la Société Francophone de Classification. Nancy, 201–207.Google Scholar
  16. HUBER, L. J., LEVIN, J. R. (1976): A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin, 83, 1072–1080.CrossRefGoogle Scholar
  17. ICHINO, M. and YAGUCHI, H. (1994): Generalized Minkowsky Metrics for Mixed Feature Type Data Analysis. IEEE Transactions System, Man and Cybernetics, 24, 698–708MathSciNetCrossRefGoogle Scholar
  18. MILLIGAN, G.W. and COOPER, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • André Hardy
    • 1
  • Pascale Lallemand
    • 1
  1. 1.Department of MathematicsUniversity of NamurNamurBelgium

Personalised recommendations