Skip to main content

Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure

  • Conference paper
Between Data Science and Applied Data Analysis

Abstract

We consider the simultaneous clustering of the rows and columns of a contingency table such that the dependence between row clusters and column clusters is maximized in the sense of maximizing a general dependence measure. We use Csiszár’s ø-divergence between the given two-way distribution and the independence case with the same margins. This includes the classical χ2 measure, Kullback-Leibler’s discriminating information, and variation distance. By using the general theory of ‘convexity-based clustering criteria’ (Bock 1992, 2002a, 2002b) we derive a k-means-like clustering algorithm that uses ‘maximum support-plane partitions’ (in terms of likelihood ratio vectors) in the same way as classical SSQ clustering uses ‘minimum-distance partitions’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BOCK, H.-H. (1968): Statistische Modelle für die einfache und doppelte Klassifikation von normalverteilten Beobachtungen. Dissertation, Univ. of Freiburg, 1968

    Google Scholar 

  • BOCK, H.-H. (1972): Statistische Madelle und Bayes’sche Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren. Metrika 18, 120–132

    Article  MathSciNet  MATH  Google Scholar 

  • BOCK, H.-H. (1974): Automatische Klassifikation. Mathematische und statistische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.

    Google Scholar 

  • BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-squared test. Bull. Intern. Statist. Inst., 44th Session, Madrid 1983, Vol. II: Contributed papers, 758–762

    Google Scholar 

  • BOCK, H.-H. (1992): A clustering algorithm for maximizing ø-divergence, non-centrality and discriminating power. In: M. Schader (ed.): Analyzing and modeling data and knowledge. Springer-Verlag, Heidelberg, 1991, 19–36

    Chapter  Google Scholar 

  • BOCK, H.-H. (1994): Information and entropy in cluster analysis. In: H. Bozdogan et al. (eds.): The Frontiers of statistical modeling: an informational approach. Proc. First US/Japan Conference on Statistical Modeling, Knoxville, Tennessee, May 1992. Kluwer Academic Press, Dordrecht, 1994, Vol. II, 115–147

    Google Scholar 

  • BOCK, H.-H. (2002a): Clustering methods with convexity-based clustering criteria with applications. Statistical Methods and Applications (submitted)

    Google Scholar 

  • BOCK, H.-H. (2002b): Two-way clustering for probability distributions: maximally dependent clusters. (Preprint)

    Google Scholar 

  • CASTILLO, W. and TREJOS, J. (2002): Two-mode partitioning: review of methods and applicatons in tabu search. In: K. Jajuga, A. Sokolowski, H.-H. Bock (eds.): Classification, clustering, and related topics. Recent advances and applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 43–51

    Google Scholar 

  • CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAMBONDRAINY, H. (1989): Classification automatique des données. Dunod, Paris. Chapitre 2.6

    Google Scholar 

  • CSISZÁR, I. (1967): Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318

    MathSciNet  MATH  Google Scholar 

  • GAUL, W. and SCHADER, M. (1996): A new algorithm for two-mode clustering. In: H.-H. Bock, W. Polasek (eds.): Data analysis and information systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 15–23

    Chapter  Google Scholar 

  • GOVAERT, G. (1983): Classification croisée. Thèse d’Etat, Université de Paris VI.

    Google Scholar 

  • PÖTZELBERGER, K. and STRASSER, H. (2001): Clustering and quantization by MSP-partitions. Statistics and Decisions 19, 331–371

    MathSciNet  MATH  Google Scholar 

  • STRASSER, H. (2000): Towards a statistical theory of optimal quantization. In: W. Gaul, O. Opitz, and M. Schader (eds.): Data analysis. Scientific modeling and practical application. Springer-Verlag, Heidelberg, 2000, 369–383.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bock, HH. (2003). Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure. In: Schader, M., Gaul, W., Vichi, M. (eds) Between Data Science and Applied Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18991-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18991-3_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40354-8

  • Online ISBN: 978-3-642-18991-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics