Skip to main content

Diagonal Co-clustering Algorithm for Document-Word Partitioning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Abstract

We propose a novel diagonal co-clustering algorithm built upon the double Kmeans to address the problem of document-word co-clustering. At each iteration, the proposed algorithm seeks for a diagonal block structure of the data by minimizing a criterion based on the variance within and the centroid effect. In addition to be easy-to-interpret and efficient on sparse binary and continuous data, Diagonal Double Kmeans (DDKM) is also faster than other state-of-the art clustering algorithms. We illustrate our contribution using real datasets commonly used in document clustering.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://adios.tau.ac.il/.

  2. 2.

    The balance coefficient is defined as the ratio of the number of documents in the smallest class to the number of documents in the largest class.

References

  1. Baier, D., Gaul, W., Schader, M.: Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar, R., Opitz, O. (eds.) Classification and knowledge organization. Springer, Heidelberg (1997)

    Google Scholar 

  2. Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. In: Computational Statistics and Data Analysis, pp. 155–173 (2006)

    Google Scholar 

  3. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD 2001, pp. 269–274 (2001)

    Google Scholar 

  4. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1–2), 143–175 (2001)

    Article  MATH  Google Scholar 

  5. Eckes, T., Orlik, P.: An error variance approach to two-mode hierarchical clustering. J. Classif. 10(1), 51–74 (1993)

    Article  MATH  Google Scholar 

  6. Govaert, G.: Classification croisée. Ph.D. thesis, Université Paris 6, France (1983)

    Google Scholar 

  7. Govaert, G., Nadif, M.: Co-Clustering: Models, Algorithms and Applications. Wiley, New York (2013)

    Book  MATH  Google Scholar 

  8. Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  MATH  Google Scholar 

  10. Li, T.: A general model for clustering binary data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 188–197 (2005)

    Google Scholar 

  11. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)

    Article  Google Scholar 

  12. Mechelen, I.V., Bock, H.H., Boeck, P.D.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Mirkin, B., Arabie, P., Hubert, L.: Additive two-mode clustering: the error-variance approach revisited. J. Classif. 12(2), 243–263 (1995)

    Article  MATH  Google Scholar 

  14. Nguyen, X.V.: Gene clustering on the unit hypersphere with the spherical k-means algorithm: coping with extremely large number of local optima. In: International Conference on Bioinformatics & Computational Biology, BIOCOMP 2008, pp. 226–233 (2008)

    Google Scholar 

  15. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)

    MathSciNet  MATH  Google Scholar 

  16. Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra, S., Rocci, R., Vichi, M., Schader, M. (eds.) Advances in classification and data analysis, pp. 43–52. Springer, Heidelberg (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Charlotte Laclau or Mohamed Nadif .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Laclau, C., Nadif, M. (2015). Diagonal Co-clustering Algorithm for Document-Word Partitioning. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24465-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24464-8

  • Online ISBN: 978-3-319-24465-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics