Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure

Bock, Hans-Hermann

doi:10.1007/978-3-642-18991-3_17

Hans-Hermann Bock⁷

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

833 Accesses
4 Citations

Abstract

We consider the simultaneous clustering of the rows and columns of a contingency table such that the dependence between row clusters and column clusters is maximized in the sense of maximizing a general dependence measure. We use Csiszár’s ø-divergence between the given two-way distribution and the independence case with the same margins. This includes the classical χ² measure, Kullback-Leibler’s discriminating information, and variation distance. By using the general theory of ‘convexity-based clustering criteria’ (Bock 1992, 2002a, 2002b) we derive a k-means-like clustering algorithm that uses ‘maximum support-plane partitions’ (in terms of likelihood ratio vectors) in the same way as classical SSQ clustering uses ‘minimum-distance partitions’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BOCK, H.-H. (1968): Statistische Modelle für die einfache und doppelte Klassifikation von normalverteilten Beobachtungen. Dissertation, Univ. of Freiburg, 1968
Google Scholar
BOCK, H.-H. (1972): Statistische Madelle und Bayes’sche Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren. Metrika 18, 120–132
Article MathSciNet MATH Google Scholar
BOCK, H.-H. (1974): Automatische Klassifikation. Mathematische und statistische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.
Google Scholar
BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-squared test. Bull. Intern. Statist. Inst., 44th Session, Madrid 1983, Vol. II: Contributed papers, 758–762
Google Scholar
BOCK, H.-H. (1992): A clustering algorithm for maximizing ø-divergence, non-centrality and discriminating power. In: M. Schader (ed.): Analyzing and modeling data and knowledge. Springer-Verlag, Heidelberg, 1991, 19–36
Chapter Google Scholar
BOCK, H.-H. (1994): Information and entropy in cluster analysis. In: H. Bozdogan et al. (eds.): The Frontiers of statistical modeling: an informational approach. Proc. First US/Japan Conference on Statistical Modeling, Knoxville, Tennessee, May 1992. Kluwer Academic Press, Dordrecht, 1994, Vol. II, 115–147
Google Scholar
BOCK, H.-H. (2002a): Clustering methods with convexity-based clustering criteria with applications. Statistical Methods and Applications (submitted)
Google Scholar
BOCK, H.-H. (2002b): Two-way clustering for probability distributions: maximally dependent clusters. (Preprint)
Google Scholar
CASTILLO, W. and TREJOS, J. (2002): Two-mode partitioning: review of methods and applicatons in tabu search. In: K. Jajuga, A. Sokolowski, H.-H. Bock (eds.): Classification, clustering, and related topics. Recent advances and applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 43–51
Google Scholar
CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAMBONDRAINY, H. (1989): Classification automatique des données. Dunod, Paris. Chapitre 2.6
Google Scholar
CSISZÁR, I. (1967): Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318
MathSciNet MATH Google Scholar
GAUL, W. and SCHADER, M. (1996): A new algorithm for two-mode clustering. In: H.-H. Bock, W. Polasek (eds.): Data analysis and information systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 15–23
Chapter Google Scholar
GOVAERT, G. (1983): Classification croisée. Thèse d’Etat, Université de Paris VI.
Google Scholar
PÖTZELBERGER, K. and STRASSER, H. (2001): Clustering and quantization by MSP-partitions. Statistics and Decisions 19, 331–371
MathSciNet MATH Google Scholar
STRASSER, H. (2000): Towards a statistical theory of optimal quantization. In: W. Gaul, O. Opitz, and M. Schader (eds.): Data analysis. Scientific modeling and practical application. Springer-Verlag, Heidelberg, 2000, 369–383.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics, Aachen University, D-52056, Aachen, Germany
Hans-Hermann Bock

Authors

Hans-Hermann Bock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Systems, University of Mannheim, Schloss, 68131, Mannheim, Germany
Martin Schader
Institute of Decision Theory, University of Karlsruhe, Kaiserstr. 12, 76128, Karlsruhe, Germany
Wolfgang Gaul
Department of Statistics, University of Rome, Piazzale Aldo Moro, 00185, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bock, HH. (2003). Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure. In: Schader, M., Gaul, W., Vichi, M. (eds) Between Data Science and Applied Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18991-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-18991-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40354-8
Online ISBN: 978-3-642-18991-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics