Parameter-Free Hierarchical Co-clustering by n-Ary Splits

Ienco, Dino; Pensa, Ruggero G.; Meo, Rosa

doi:10.1007/978-3-642-04180-8_55

Dino Ienco²²,
Ruggero G. Pensa²² &
Rosa Meo²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2612 Accesses
10 Citations

Abstract

Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches identify only a predefined number of flat co-clusters. Instead, it is useful if the clusters are arranged in a hierarchical fashion because the hierarchy provides insides on the clusters. In this paper we propose a novel hierarchical co-clustering, which builds two coupled hierarchies, one on the objects and one on features thus providing insights on both them. Our approach does not require a pre-specified number of clusters, and produces compact hierarchies because it makes n −ary splits, where n is automatically determined. We validate our approach on several high-dimensional datasets with state of the art competitors.

Download to read the full chapter text

Chapter PDF

“Anti-Bayesian” Flat and Hierarchical Clustering Using Symmetric Quantiloids

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Cluster Representation and Discrimination Based on Regression Line

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Han, J., Kamber, M.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2000)
MATH Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)
Article Google Scholar
Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 13, 703–716 (2003)
Article Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proc. ACM SIGKDD 2003, Washington, USA, pp. 89–98. ACM, New York (2003)
Google Scholar
Robardet, C., Feschet, F.: Comparison of three objective functions for conceptual clustering. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 399–410. Springer, Heidelberg (2001)
Chapter Google Scholar
Robardet, C.: Contribution à la classification non supervisée: proposition d’une methode de bi-partitionnement. PhD thesis, Université Claude Bernard - Lyon 1 (Juliet 2002)
Google Scholar
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classification. Journal of the American Statistical Association 49, 732–764 (1954)
MATH Google Scholar
Robardet, C., Feschet, F.: Efficient local search in conceptual clustering. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 323–335. Springer, Heidelberg (2001)
Chapter Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Article MATH Google Scholar
Goodman, L.A., Kruskal, W.H.: Measure of association for cross classification ii: further discussion and references. Journal of the American Statistical Association 54, 123–163 (1959)
Article MATH Google Scholar
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. SIGIR 2000, New York, NY, USA, pp. 208–215 (2000)
Google Scholar
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proc. SIAM SDM 2004, Lake Buena Vista, USA (2004)
Google Scholar
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to bregman co-clustering and matrix approximation. JMLR 8, 1919–1986 (2007)
MathSciNet MATH Google Scholar
Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: Proc. PODS 2008, Vancouver, BC, Canada, pp. 201–210 (2008)
Google Scholar
Hosseini, M., Abolhassani, H.: Hierarchical co-clustering for web queries and selected urls. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 653–662. Springer, Heidelberg (2007)
Chapter Google Scholar
Costa, G., Manco, G., Ortale, R.: A hierarchical model-based approach to co-clustering high-dimensional data. In: Proc. of ACM SAC 2008, Fortaleza, Ceara, Brazil, pp. 886–890 (2008)
Google Scholar
Heard, N.A., Holmes, C.C., Stephens, D.A., Hand, D.J., Dimopoulos, G.: Bayesian coclustering of anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proc. Natl. Acad. Sci. (102), 16939–16944
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Torino, I-10149, Turin, Italy
Dino Ienco, Ruggero G. Pensa & Rosa Meo

Authors

Dino Ienco
View author publications
You can also search for this author in PubMed Google Scholar
Ruggero G. Pensa
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Meo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ienco, D., Pensa, R.G., Meo, R. (2009). Parameter-Free Hierarchical Co-clustering by n-Ary Splits. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

Abstract

Chapter PDF

Similar content being viewed by others

“Anti-Bayesian” Flat and Hierarchical Clustering Using Symmetric Quantiloids

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Cluster Representation and Discrimination Based on Regression Line

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

Abstract

Chapter PDF

Similar content being viewed by others

“Anti-Bayesian” Flat and Hierarchical Clustering Using Symmetric Quantiloids

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Cluster Representation and Discrimination Based on Regression Line

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation