Abstract
We consider the simultaneous clustering of the rows and columns of a contingency table such that the dependence between row clusters and column clusters is maximized in the sense of maximizing a general dependence measure. We use Csiszár’s ø-divergence between the given two-way distribution and the independence case with the same margins. This includes the classical χ2 measure, Kullback-Leibler’s discriminating information, and variation distance. By using the general theory of ‘convexity-based clustering criteria’ (Bock 1992, 2002a, 2002b) we derive a k-means-like clustering algorithm that uses ‘maximum support-plane partitions’ (in terms of likelihood ratio vectors) in the same way as classical SSQ clustering uses ‘minimum-distance partitions’.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BOCK, H.-H. (1968): Statistische Modelle für die einfache und doppelte Klassifikation von normalverteilten Beobachtungen. Dissertation, Univ. of Freiburg, 1968
BOCK, H.-H. (1972): Statistische Madelle und Bayes’sche Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren. Metrika 18, 120–132
BOCK, H.-H. (1974): Automatische Klassifikation. Mathematische und statistische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.
BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-squared test. Bull. Intern. Statist. Inst., 44th Session, Madrid 1983, Vol. II: Contributed papers, 758–762
BOCK, H.-H. (1992): A clustering algorithm for maximizing ø-divergence, non-centrality and discriminating power. In: M. Schader (ed.): Analyzing and modeling data and knowledge. Springer-Verlag, Heidelberg, 1991, 19–36
BOCK, H.-H. (1994): Information and entropy in cluster analysis. In: H. Bozdogan et al. (eds.): The Frontiers of statistical modeling: an informational approach. Proc. First US/Japan Conference on Statistical Modeling, Knoxville, Tennessee, May 1992. Kluwer Academic Press, Dordrecht, 1994, Vol. II, 115–147
BOCK, H.-H. (2002a): Clustering methods with convexity-based clustering criteria with applications. Statistical Methods and Applications (submitted)
BOCK, H.-H. (2002b): Two-way clustering for probability distributions: maximally dependent clusters. (Preprint)
CASTILLO, W. and TREJOS, J. (2002): Two-mode partitioning: review of methods and applicatons in tabu search. In: K. Jajuga, A. Sokolowski, H.-H. Bock (eds.): Classification, clustering, and related topics. Recent advances and applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 43–51
CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAMBONDRAINY, H. (1989): Classification automatique des données. Dunod, Paris. Chapitre 2.6
CSISZÁR, I. (1967): Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318
GAUL, W. and SCHADER, M. (1996): A new algorithm for two-mode clustering. In: H.-H. Bock, W. Polasek (eds.): Data analysis and information systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, 15–23
GOVAERT, G. (1983): Classification croisée. Thèse d’Etat, Université de Paris VI.
PÖTZELBERGER, K. and STRASSER, H. (2001): Clustering and quantization by MSP-partitions. Statistics and Decisions 19, 331–371
STRASSER, H. (2000): Towards a statistical theory of optimal quantization. In: W. Gaul, O. Opitz, and M. Schader (eds.): Data analysis. Scientific modeling and practical application. Springer-Verlag, Heidelberg, 2000, 369–383.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bock, HH. (2003). Two-Way Clustering for Contingency Tables: Maximizing a Dependence Measure. In: Schader, M., Gaul, W., Vichi, M. (eds) Between Data Science and Applied Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18991-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-18991-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40354-8
Online ISBN: 978-3-642-18991-3
eBook Packages: Springer Book Archive