Diagonal Co-clustering Algorithm for Document-Word Partitioning

Laclau, Charlotte; Nadif, Mohamed

doi:10.1007/978-3-319-24465-5_15

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Charlotte Laclau¹⁶ &
Mohamed Nadif¹⁶

Conference paper
First Online: 22 November 2015

1254 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Abstract

We propose a novel diagonal co-clustering algorithm built upon the double Kmeans to address the problem of document-word co-clustering. At each iteration, the proposed algorithm seeks for a diagonal block structure of the data by minimizing a criterion based on the variance within and the centroid effect. In addition to be easy-to-interpret and efficient on sparse binary and continuous data, Diagonal Double Kmeans (DDKM) is also faster than other state-of-the art clustering algorithms. We illustrate our contribution using real datasets commonly used in document clustering.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://adios.tau.ac.il/.
2.
The balance coefficient is defined as the ratio of the number of documents in the smallest class to the number of documents in the largest class.

References

Baier, D., Gaul, W., Schader, M.: Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar, R., Opitz, O. (eds.) Classification and knowledge organization. Springer, Heidelberg (1997)
Google Scholar
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. In: Computational Statistics and Data Analysis, pp. 155–173 (2006)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD 2001, pp. 269–274 (2001)
Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1–2), 143–175 (2001)
Article MATH Google Scholar
Eckes, T., Orlik, P.: An error variance approach to two-mode hierarchical clustering. J. Classif. 10(1), 51–74 (1993)
Article MATH Google Scholar
Govaert, G.: Classification croisée. Ph.D. thesis, Université Paris 6, France (1983)
Google Scholar
Govaert, G., Nadif, M.: Co-Clustering: Models, Algorithms and Applications. Wiley, New York (2013)
Book MATH Google Scholar
Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Article MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Li, T.: A general model for clustering binary data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 188–197 (2005)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)
Article Google Scholar
Mechelen, I.V., Bock, H.H., Boeck, P.D.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)
Article MathSciNet MATH Google Scholar
Mirkin, B., Arabie, P., Hubert, L.: Additive two-mode clustering: the error-variance approach revisited. J. Classif. 12(2), 243–263 (1995)
Article MATH Google Scholar
Nguyen, X.V.: Gene clustering on the unit hypersphere with the spherical k-means algorithm: coping with extremely large number of local optima. In: International Conference on Bioinformatics & Computational Biology, BIOCOMP 2008, pp. 226–233 (2008)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
MathSciNet MATH Google Scholar
Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra, S., Rocci, R., Vichi, M., Schader, M. (eds.) Advances in classification and data analysis, pp. 43–52. Springer, Heidelberg (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

LIPADE, Université Paris Descartes, 45 Rue des Saint-Pères, 75006, Paris, France
Charlotte Laclau & Mohamed Nadif

Authors

Charlotte Laclau
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Charlotte Laclau or Mohamed Nadif .

Editor information

Editors and Affiliations

Université de Saint-Etienne, Saint-Etienne, France
Elisa Fromont
Intelligent Systems Lab, University of Bristol Intelligent Systems Lab, Bristol, United Kingdom
Tijl De Bie
Informatics Section, Katholieke Universiteit Leuven, Leuven, Belgium
Matthijs van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laclau, C., Nadif, M. (2015). Diagonal Co-clustering Algorithm for Document-Word Partitioning. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-24465-5_15
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics