Advertisement

Fast Information-Theoretic Agglomerative Co-clustering

  • Tiantian Gao
  • Leman Akoglu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8506)

Abstract

Jointly clustering the rows and the columns of large matrices, a.k.a. co-clustering, finds numerous applications in the real world such as collaborative filtering, market-basket and micro-array data analysis, graph clustering, etc. In this paper, we formulate an information-theoretic objective cost function to solve this problem, and develop a fast agglomerative algorithm to optimize this objective. Our algorithm rapidly finds highly similar clusters to be merged in an iterative fashion using Locality-Sensitive Hashing. Thanks to its bottom-up nature, it also enables the analysis of the cluster hierarchies. Finally, the number of row and column clusters are automatically determined without requiring the user to choose them. Our experiments on both real and synthetic datasets show that the proposed algorithm achieves high-quality clustering solutions and scales linearly with the input matrix size.

Keywords

Adjacency Matrix Hash Table Synthetic Dataset Minimum Description Length Subspace Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdullah, A., Hussain, A.: A new biclustering technique based on crossing minimization. Neurocomputing 69(16-18), 1882–1896 (2006)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. In: SIGMOD, pp. 94–105 (1998)Google Scholar
  3. 3.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD 22(2), 207–216 (1993)CrossRefGoogle Scholar
  4. 4.
    Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In: SDM (2012)Google Scholar
  5. 5.
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–86. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Chakrabarti, D.: AutoPart: Parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)Google Scholar
  7. 7.
    Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: ACM SIGKDD, pp. 79–88 (2004)Google Scholar
  8. 8.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASI 41(6), 391–407 (1990)CrossRefGoogle Scholar
  9. 9.
    Dhillon, I., Mallela, S., Modha, D.: Information- theoretic co-clustering. In: ACM SIGKDD (2003)Google Scholar
  10. 10.
    Fortunato, S., Barthélemy, M.: PNAS, 104(1), 36 (2007)Google Scholar
  11. 11.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)Google Scholar
  12. 12.
    Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0-1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)Google Scholar
  14. 14.
    Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: ICDM, pp. 211–218 (2002)Google Scholar
  15. 15.
    Karypis, G., Han, E.-H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32(8) (1999)Google Scholar
  16. 16.
    Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey. TKDD 3(1), 1:1–1:58 (2009)Google Scholar
  17. 17.
    Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM (2004)Google Scholar
  18. 18.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  19. 19.
    Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 448–462. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E 69 (2004)Google Scholar
  21. 21.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)Google Scholar
  22. 22.
    Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-nearest neighbor computation. In: ICDE, pp. 378–389 (2012)Google Scholar
  23. 23.
    Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML (2000)Google Scholar
  24. 24.
    Reiss, D.J., Baliga, N.S., Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets. BMC Bioinformatics 7, 280 (2006)CrossRefGoogle Scholar
  25. 25.
    Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11(2), 416–431 (1983)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Satuluri, V., Parthasarathy, S.: Bayesian locality sensitive hashing for fast similarity search. PVLDB 5(5), 430–441 (2012)Google Scholar
  27. 27.
    Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS (1999)Google Scholar
  28. 28.
    Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: ACM SIGKDD, pp. 687–696 (2007)Google Scholar
  29. 29.
    Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: A ranking driven approach. In: ICDE, pp. 410–421 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tiantian Gao
    • 1
  • Leman Akoglu
    • 1
  1. 1.Department of Computer ScienceStony Brook UniversityUSA

Personalised recommendations