Null Models in Cluster Validation

Gordon, A. D.

doi:10.1007/978-3-642-79999-0_3

Null Models in Cluster Validation

A. D. Gordon⁶

Conference paper

495 Accesses
23 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Summary

A brief overview is given of the problem of validation in classification studies. Attention is concentrated on the specification of appropriate null models for data, with respect to which one may assess some cluster structure that has been obtained as the output of a clustering algorithm. In addition to standard null models, a discussion is given of ‘data-influenced’ null models, in which the precise form of the null hypothesis is influenced by characteristics of the data set under investigation. To illustrate the importance of specifying relevant null models, the behaviour of U-statistics under these null models is used to assess individual clusters found when data were classified using some standard clustering criteria implemented in an agglomerative algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BOCK, H. H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.
Article Google Scholar
BOCK, H. H. (1989): Probabilistic aspects in cluster analysis. In: O. Opitz (ed.): Conceptual and Numerical Analysis of Data. Springer-Verlag, Berlin, 12–44.
Google Scholar
BRECKENRIDGE, J. N. (1989): Replicating cluster analysis: Method, consistency, and validity. Multivariate Behavioral Research, 24, 147–161
Article Google Scholar
CHAND, D. R., and KAPUR, S. S. (1970): An algorithm for convex polytopes. Journal of the Association for Computing Machinery, 17, 78–86.
Google Scholar
CHAZELLE, B. (1985): Fast searching in a real algebraic manifold with applications to geometric complexity. Lecture Notes in Computer Science, 185, 145–156.
Google Scholar
COOK, R. D., HAWKINS, D. M., and WEISBERG, S. (1993): Exact iterative computation of the robust multivariate minimum volume ellipsoid estimator. Statistics & Probability Letters, 16, 213–218.
Article Google Scholar
DOBKIN, D., and LIPTON, R. J. (1976): Multidimensional searching problems. SIAM Journal on Computing, 5, 181–186.
Article Google Scholar
DUBES, R. C., and ZENG, G. (1987): A test for spatial homogeneity in cluster analysis.Journal of Classification, 4, 33–56.
Article Google Scholar
EDELSBRUNNER, H. (1987): Algorithms in Combinatorial Geometry. Springer-Verlag, Berlin.
Google Scholar
EDELSBRUNNER, H., KIRKPATRICK, D. G., and SEIDEL, R. (1983): On the shape of a set of points in the plane.IEEE Trans, on Inform. Theory, IT-29, 551–559.
Article Google Scholar
FISHER, L., and VAN NESS, J. W. (1971): Admissible clustering procedures. Biometrika, 58, 91–104
Article Google Scholar
GORDON, A. D. (1981): Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman & Hall, London.
Google Scholar
GORDON, A. D. (1994a): Clustering algorithms and cluster validation. In: P. Dirschedl and R. Ostermann (eds.): Computational Statistics. Physica-Verlag, Heidelberg, 503–518.
Google Scholar
GORDON, A. D. (1994b): Identifying genuine clusters in a classification. Computational Statistics & Data Analysis, 18, in press.
Google Scholar
GOWER, J. C., and BANFIELD, C. F. (1975): Goodness-of-fit criteria for hierarchical classification and their empirical distributions. In: L. C. A. Corsten and T. Postelnicu (eds.): Proc. of the 8 ^th Intern. Biometric Conference, 347–361.
Google Scholar
HARPER, C. W., Jr. (1978): Groupings by locality in community ecology and paleoecology: Tests of significance. Lethaia, 11, 251–257.
Article Google Scholar
HARTIGAN, J. A. (1975): Clustering Algorithms. Wiley, New York.
Google Scholar
HARTIGAN, J. A., and MOHANTY, S. (1992): The RUNT test for multimodality. Journal of Classification, 9, 63–70.
Article Google Scholar
HSUAN, F. C. (1979): Generating uniform polygonal random pairs. Applied Statistics, 28, 170–172.
Article Google Scholar
HWANG, K., and BRIGGS, F. A. (1984): Computer Architecture and Parallel Processing. McGraw-Hill, New York.
Google Scholar
JAIN, A. K., and DUBES, R. C. (1988): Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
JAIN, A. K., and MOREAU, J. V. (1987): Bootstrap techniques in cluster analysis. Pattern Recognition, 20, 547–568.
Article Google Scholar
LING, R. F. (1973): A probability theory for cluster analysis. Journal of the American Statistical Association, 68, 159–164
Article Google Scholar
McINTYRE, R. M., and BLASHFIELD, R. K. (1980): A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225–238.
Article Google Scholar
MANN, H. B., and WHITNEY, D. R. (1947): On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
Article Google Scholar
PERRUCHET, C. (1983): Une analyse bibliographique des épreuves de classifiabilité en analyse des données. Statistiques et Analyse de Données, 8, 18–41
Google Scholar
PREPARATA, F. P., and SHAMOS, M. I. (1988): Computational Geometry: An Introduction. Springer-Verlag, New York.
Google Scholar
RIPLEY, B. D., and RASSON, J. P. (1977): Finding the edge of a Poisson forest. Journal of Applied Probability, 14, 483–491.
Article Google Scholar
ROHLF, F. J., and FISHER, D. R. (1968): Tests for hierarchical structure in random data sets. Systematic Zoology, 17, 407–412.
Article Google Scholar
RUBIN, P. A. (1984): Generating random points in a polytope. Communications in Statistics: Simulation and Computation, B 13, 375–396.
Article Google Scholar
SMITH, S. P., and JAIN, A. K. (1984): Testing for uniformity in multidimensional data.IEEE Trans, on Pattern Analysis and Mach. Intell. PAMI-6, 73–81.
Google Scholar
STRAUSS, R. E. (1982): Statistical significance of species clusters in association analysis. Ecology, 63, 634–639.
Article Google Scholar
TITTERINGTON, D. M. (1975): Optimal design: Some geometrical aspects of D-optimaiity. Biometrika, 62, 313–320.
Google Scholar
VASSILIOU, A., IGNATIADES, L., and KARYDIS, M. (1989): Clustering of transect phytoplankton collections with a quick randomization algorithm. Journal of Experimental Marine Biology and Ecology, 130, 135–145.
Article Google Scholar
WARD, J. H., Jr. (1963): Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244
Article Google Scholar
ZENG, G., and DUBES, R. C. (1985): A comparison of tests for randomness. Pattern Recognition, 18, 191–198
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Institute, University of St Andrews, North Haugh, KY16 9SS, St Andrews, Scotland
A. D. Gordon

Authors

A. D. Gordon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), Postfach 6980, 76128, Karlsruhe, Germany
Wolfgang Gaul
FB6 (Mathematik), Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129, Oldenburg, Germany
Dietmar Pfeifer

Additional information

Dedicated to the memory of Richard Dubes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gordon, A.D. (1996). Null Models in Cluster Validation. In: Gaul, W., Pfeifer, D. (eds) From Data to Knowledge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79999-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-79999-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60354-2
Online ISBN: 978-3-642-79999-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics