Summary
A brief overview is given of the problem of validation in classification studies. Attention is concentrated on the specification of appropriate null models for data, with respect to which one may assess some cluster structure that has been obtained as the output of a clustering algorithm. In addition to standard null models, a discussion is given of ‘data-influenced’ null models, in which the precise form of the null hypothesis is influenced by characteristics of the data set under investigation. To illustrate the importance of specifying relevant null models, the behaviour of U-statistics under these null models is used to assess individual clusters found when data were classified using some standard clustering criteria implemented in an agglomerative algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BOCK, H. H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.
BOCK, H. H. (1989): Probabilistic aspects in cluster analysis. In: O. Opitz (ed.): Conceptual and Numerical Analysis of Data. Springer-Verlag, Berlin, 12–44.
BRECKENRIDGE, J. N. (1989): Replicating cluster analysis: Method, consistency, and validity. Multivariate Behavioral Research, 24, 147–161
CHAND, D. R., and KAPUR, S. S. (1970): An algorithm for convex polytopes. Journal of the Association for Computing Machinery, 17, 78–86.
CHAZELLE, B. (1985): Fast searching in a real algebraic manifold with applications to geometric complexity. Lecture Notes in Computer Science, 185, 145–156.
COOK, R. D., HAWKINS, D. M., and WEISBERG, S. (1993): Exact iterative computation of the robust multivariate minimum volume ellipsoid estimator. Statistics & Probability Letters, 16, 213–218.
DOBKIN, D., and LIPTON, R. J. (1976): Multidimensional searching problems. SIAM Journal on Computing, 5, 181–186.
DUBES, R. C., and ZENG, G. (1987): A test for spatial homogeneity in cluster analysis.Journal of Classification, 4, 33–56.
EDELSBRUNNER, H. (1987): Algorithms in Combinatorial Geometry. Springer-Verlag, Berlin.
EDELSBRUNNER, H., KIRKPATRICK, D. G., and SEIDEL, R. (1983): On the shape of a set of points in the plane.IEEE Trans, on Inform. Theory, IT-29, 551–559.
FISHER, L., and VAN NESS, J. W. (1971): Admissible clustering procedures. Biometrika, 58, 91–104
GORDON, A. D. (1981): Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman & Hall, London.
GORDON, A. D. (1994a): Clustering algorithms and cluster validation. In: P. Dirschedl and R. Ostermann (eds.): Computational Statistics. Physica-Verlag, Heidelberg, 503–518.
GORDON, A. D. (1994b): Identifying genuine clusters in a classification. Computational Statistics & Data Analysis, 18, in press.
GOWER, J. C., and BANFIELD, C. F. (1975): Goodness-of-fit criteria for hierarchical classification and their empirical distributions. In: L. C. A. Corsten and T. Postelnicu (eds.): Proc. of the 8 th Intern. Biometric Conference, 347–361.
HARPER, C. W., Jr. (1978): Groupings by locality in community ecology and paleoecology: Tests of significance. Lethaia, 11, 251–257.
HARTIGAN, J. A. (1975): Clustering Algorithms. Wiley, New York.
HARTIGAN, J. A., and MOHANTY, S. (1992): The RUNT test for multimodality. Journal of Classification, 9, 63–70.
HSUAN, F. C. (1979): Generating uniform polygonal random pairs. Applied Statistics, 28, 170–172.
HWANG, K., and BRIGGS, F. A. (1984): Computer Architecture and Parallel Processing. McGraw-Hill, New York.
JAIN, A. K., and DUBES, R. C. (1988): Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
JAIN, A. K., and MOREAU, J. V. (1987): Bootstrap techniques in cluster analysis. Pattern Recognition, 20, 547–568.
LING, R. F. (1973): A probability theory for cluster analysis. Journal of the American Statistical Association, 68, 159–164
McINTYRE, R. M., and BLASHFIELD, R. K. (1980): A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225–238.
MANN, H. B., and WHITNEY, D. R. (1947): On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
PERRUCHET, C. (1983): Une analyse bibliographique des épreuves de classifiabilité en analyse des données. Statistiques et Analyse de Données, 8, 18–41
PREPARATA, F. P., and SHAMOS, M. I. (1988): Computational Geometry: An Introduction. Springer-Verlag, New York.
RIPLEY, B. D., and RASSON, J. P. (1977): Finding the edge of a Poisson forest. Journal of Applied Probability, 14, 483–491.
ROHLF, F. J., and FISHER, D. R. (1968): Tests for hierarchical structure in random data sets. Systematic Zoology, 17, 407–412.
RUBIN, P. A. (1984): Generating random points in a polytope. Communications in Statistics: Simulation and Computation, B 13, 375–396.
SMITH, S. P., and JAIN, A. K. (1984): Testing for uniformity in multidimensional data.IEEE Trans, on Pattern Analysis and Mach. Intell. PAMI-6, 73–81.
STRAUSS, R. E. (1982): Statistical significance of species clusters in association analysis. Ecology, 63, 634–639.
TITTERINGTON, D. M. (1975): Optimal design: Some geometrical aspects of D-optimaiity. Biometrika, 62, 313–320.
VASSILIOU, A., IGNATIADES, L., and KARYDIS, M. (1989): Clustering of transect phytoplankton collections with a quick randomization algorithm. Journal of Experimental Marine Biology and Ecology, 130, 135–145.
WARD, J. H., Jr. (1963): Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244
ZENG, G., and DUBES, R. C. (1985): A comparison of tests for randomness. Pattern Recognition, 18, 191–198
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Additional information
Dedicated to the memory of Richard Dubes
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Gordon, A.D. (1996). Null Models in Cluster Validation. In: Gaul, W., Pfeifer, D. (eds) From Data to Knowledge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79999-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-79999-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60354-2
Online ISBN: 978-3-642-79999-0
eBook Packages: Springer Book Archive