Skip to main content

Null Models in Cluster Validation

  • Conference paper

Summary

A brief overview is given of the problem of validation in classification studies. Attention is concentrated on the specification of appropriate null models for data, with respect to which one may assess some cluster structure that has been obtained as the output of a clustering algorithm. In addition to standard null models, a discussion is given of ‘data-influenced’ null models, in which the precise form of the null hypothesis is influenced by characteristics of the data set under investigation. To illustrate the importance of specifying relevant null models, the behaviour of U-statistics under these null models is used to assess individual clusters found when data were classified using some standard clustering criteria implemented in an agglomerative algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BOCK, H. H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.

    Article  Google Scholar 

  • BOCK, H. H. (1989): Probabilistic aspects in cluster analysis. In: O. Opitz (ed.): Conceptual and Numerical Analysis of Data. Springer-Verlag, Berlin, 12–44.

    Google Scholar 

  • BRECKENRIDGE, J. N. (1989): Replicating cluster analysis: Method, consistency, and validity. Multivariate Behavioral Research, 24, 147–161

    Article  Google Scholar 

  • CHAND, D. R., and KAPUR, S. S. (1970): An algorithm for convex polytopes. Journal of the Association for Computing Machinery, 17, 78–86.

    Google Scholar 

  • CHAZELLE, B. (1985): Fast searching in a real algebraic manifold with applications to geometric complexity. Lecture Notes in Computer Science, 185, 145–156.

    Google Scholar 

  • COOK, R. D., HAWKINS, D. M., and WEISBERG, S. (1993): Exact iterative computation of the robust multivariate minimum volume ellipsoid estimator. Statistics & Probability Letters, 16, 213–218.

    Article  Google Scholar 

  • DOBKIN, D., and LIPTON, R. J. (1976): Multidimensional searching problems. SIAM Journal on Computing, 5, 181–186.

    Article  Google Scholar 

  • DUBES, R. C., and ZENG, G. (1987): A test for spatial homogeneity in cluster analysis.Journal of Classification, 4, 33–56.

    Article  Google Scholar 

  • EDELSBRUNNER, H. (1987): Algorithms in Combinatorial Geometry. Springer-Verlag, Berlin.

    Google Scholar 

  • EDELSBRUNNER, H., KIRKPATRICK, D. G., and SEIDEL, R. (1983): On the shape of a set of points in the plane.IEEE Trans, on Inform. Theory, IT-29, 551–559.

    Article  Google Scholar 

  • FISHER, L., and VAN NESS, J. W. (1971): Admissible clustering procedures. Biometrika, 58, 91–104

    Article  Google Scholar 

  • GORDON, A. D. (1981): Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman & Hall, London.

    Google Scholar 

  • GORDON, A. D. (1994a): Clustering algorithms and cluster validation. In: P. Dirschedl and R. Ostermann (eds.): Computational Statistics. Physica-Verlag, Heidelberg, 503–518.

    Google Scholar 

  • GORDON, A. D. (1994b): Identifying genuine clusters in a classification. Computational Statistics & Data Analysis, 18, in press.

    Google Scholar 

  • GOWER, J. C., and BANFIELD, C. F. (1975): Goodness-of-fit criteria for hierarchical classification and their empirical distributions. In: L. C. A. Corsten and T. Postelnicu (eds.): Proc. of the 8 th Intern. Biometric Conference, 347–361.

    Google Scholar 

  • HARPER, C. W., Jr. (1978): Groupings by locality in community ecology and paleoecology: Tests of significance. Lethaia, 11, 251–257.

    Article  Google Scholar 

  • HARTIGAN, J. A. (1975): Clustering Algorithms. Wiley, New York.

    Google Scholar 

  • HARTIGAN, J. A., and MOHANTY, S. (1992): The RUNT test for multimodality. Journal of Classification, 9, 63–70.

    Article  Google Scholar 

  • HSUAN, F. C. (1979): Generating uniform polygonal random pairs. Applied Statistics, 28, 170–172.

    Article  Google Scholar 

  • HWANG, K., and BRIGGS, F. A. (1984): Computer Architecture and Parallel Processing. McGraw-Hill, New York.

    Google Scholar 

  • JAIN, A. K., and DUBES, R. C. (1988): Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • JAIN, A. K., and MOREAU, J. V. (1987): Bootstrap techniques in cluster analysis. Pattern Recognition, 20, 547–568.

    Article  Google Scholar 

  • LING, R. F. (1973): A probability theory for cluster analysis. Journal of the American Statistical Association, 68, 159–164

    Article  Google Scholar 

  • McINTYRE, R. M., and BLASHFIELD, R. K. (1980): A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225–238.

    Article  Google Scholar 

  • MANN, H. B., and WHITNEY, D. R. (1947): On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.

    Article  Google Scholar 

  • PERRUCHET, C. (1983): Une analyse bibliographique des épreuves de classifiabilité en analyse des données. Statistiques et Analyse de Données, 8, 18–41

    Google Scholar 

  • PREPARATA, F. P., and SHAMOS, M. I. (1988): Computational Geometry: An Introduction. Springer-Verlag, New York.

    Google Scholar 

  • RIPLEY, B. D., and RASSON, J. P. (1977): Finding the edge of a Poisson forest. Journal of Applied Probability, 14, 483–491.

    Article  Google Scholar 

  • ROHLF, F. J., and FISHER, D. R. (1968): Tests for hierarchical structure in random data sets. Systematic Zoology, 17, 407–412.

    Article  Google Scholar 

  • RUBIN, P. A. (1984): Generating random points in a polytope. Communications in Statistics: Simulation and Computation, B 13, 375–396.

    Article  Google Scholar 

  • SMITH, S. P., and JAIN, A. K. (1984): Testing for uniformity in multidimensional data.IEEE Trans, on Pattern Analysis and Mach. Intell. PAMI-6, 73–81.

    Google Scholar 

  • STRAUSS, R. E. (1982): Statistical significance of species clusters in association analysis. Ecology, 63, 634–639.

    Article  Google Scholar 

  • TITTERINGTON, D. M. (1975): Optimal design: Some geometrical aspects of D-optimaiity. Biometrika, 62, 313–320.

    Google Scholar 

  • VASSILIOU, A., IGNATIADES, L., and KARYDIS, M. (1989): Clustering of transect phytoplankton collections with a quick randomization algorithm. Journal of Experimental Marine Biology and Ecology, 130, 135–145.

    Article  Google Scholar 

  • WARD, J. H., Jr. (1963): Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244

    Article  Google Scholar 

  • ZENG, G., and DUBES, R. C. (1985): A comparison of tests for randomness. Pattern Recognition, 18, 191–198

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Additional information

Dedicated to the memory of Richard Dubes

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Gordon, A.D. (1996). Null Models in Cluster Validation. In: Gaul, W., Pfeifer, D. (eds) From Data to Knowledge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79999-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-79999-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60354-2

  • Online ISBN: 978-3-642-79999-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics