A Unified Adaptive Co-identification Framework for High-D Expression Data

  • Shuzhong Zhang
  • Kun Wang
  • Cody Ashby
  • Bilian Chen
  • Xiuzhen Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


High-throughput techniques are producing large-scale high-dimensional (e.g., 4D with genes vs timepoints vs conditions vs tissues) genome-wide gene expression data. This induces increasing demands for effective methods for partitioning the data into biologically relevant groups. Current clustering and co-clustering approaches have limitations, which may be very time consuming and work for only low-dimensional expression datasets. In this work, we introduce a new notion of “co-identification”, which allows systematical identification of genes participating different functional groups under different conditions or different development stages. The key contribution of our work is to build a unified computational framework of co-identification that enables clustering to be high-dimensional and adaptive. Our framework is based upon a generic optimization model and a general optimization method termed Maximum Block Improvement. Testing results on yeast and Arabidopsis expression data are presented to demonstrate high efficiency of our approach and its effectiveness.


Gene Expression Data Gene Expression Dataset Classical Cluster Generalize Maximum Entropy Yeast Gene Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aguilar-Ruiz, J.S.: Shifting and scaling patterns from gene expression data. Bioinformatics 21, 3840–3845 (2005)CrossRefGoogle Scholar
  2. 2.
    Banerjee, A., et al.: A generalized maximum entropy approach to bregman coclustering and matrix approximation. JMLR 8, 1919–1986 (2007)zbMATHGoogle Scholar
  3. 3.
    Ben-Dor, A., et al.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: RECOMB 2002, pp. 49–57 (2002)Google Scholar
  4. 4.
    Ben-Hur, A., et al.: A stability based method for discovering structure in clustered data. In: Proc. of PSB (2002)Google Scholar
  5. 5.
    Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  6. 6.
    Chen, B., et al.: Maximum block improvement and polynomial optimization. SIAM Journal on Optimization 22, 87–107 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 93–103 (2000)Google Scholar
  8. 8.
    Cheung, A.N.: Molecular targets in gynaecological cancers. Pathology 39, 26–45 (2007)CrossRefGoogle Scholar
  9. 9.
    Cho, H., et al.: Minimum sum-squared residue co-clustering of gene expression data. In: Proc. SIAM on Data Mining, pp. 114–125 (2004)Google Scholar
  10. 10.
    Costa, I.G., et al.: Comparative analysis of clustering methods for gene expression time course data. Genet. Mol. Biol. 27, 623–631 (2004)CrossRefGoogle Scholar
  11. 11.
    Deodhar, M., et al.: Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets. In: IEEE Intl. Conf. on Data Mining Workshops (2008)Google Scholar
  12. 12.
    D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23, 1499–1501 (2005)CrossRefGoogle Scholar
  13. 13.
    Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer (1999)Google Scholar
  14. 14.
    Dudoit, S., Fridlyand, J.: A prediction based resampling method for estimating the number of clusters in a data set. Genome Biology 3, 1–21 (2002)CrossRefGoogle Scholar
  15. 15.
    Eisen, M.B., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)CrossRefGoogle Scholar
  16. 16.
    Gibbons, F.D., Roth, F.P.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. 12, 1574–1581 (2002)CrossRefGoogle Scholar
  17. 17.
    Hochreiter, S., et al.: FABIA: factor analysis for bicluster acquisition. Bioinformatics 26, 1520–1527 (2010)CrossRefGoogle Scholar
  18. 18.
    Kilian, J., et al.: The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. The Plant Journal 2, 347–363 (2007)CrossRefGoogle Scholar
  19. 19.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51, 455–500 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Jegelka, S., Sra, S., Banerjee, A.: Approximation Algorithms for Tensor Clustering. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS, vol. 5809, pp. 368–383. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  21. 21.
    Jiang, D., et al.: Mining coherent gene clusters from gene-sample-time microarray data. In: Proc. ACM SIGKDD, pp. 430–439 (2004)Google Scholar
  22. 22.
    Lathauwer, D., et al.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Lazzeroni, L., Owen, A.B.: Plaid models for gene expression data. Statistica Sinica 12, 61–86 (2002)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Lee, M., et al.: Biclustering via Sparse Singular Value Decomposition. Biometrics 66, 1087–1095 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Li, A., Tuck, D.: An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation. Gene Regulation and Systems Biology 3, 49–64 (2009)Google Scholar
  26. 26.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1, 24–45 (2004)CrossRefGoogle Scholar
  27. 27.
    Magic, Z., et al.: cDNA microarrays: identification of gene signatures and their application in clinical practice. J. BUON 12(suppl.1), S39–S44 (2007)Google Scholar
  28. 28.
    Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Pacific Symposium on Biocomputing, vol. 8, pp. 77–88 (2003)Google Scholar
  29. 29.
    Prelic, A., et al.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122–1129 (2006)CrossRefGoogle Scholar
  30. 30.
    Snider, N., Diab, M.: Unsupervised Induction of Modern Standard Arabic Verb Classes. In: HLT-NAACL, New York (2006)Google Scholar
  31. 31.
    Strauch, M., et al.: A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main Features of the Arabidopsis Stress Response. J. Integrative Bioinformatics 4, 54–66 (2007)Google Scholar
  32. 32.
    Supper, J., et al.: EDISA: extracting biclusters from multiple time-series of gene expression profiles. BMC Bioinformatics 8, 334–347 (2007)CrossRefGoogle Scholar
  33. 33.
    Suter, L., et al.: Toxicogenomics in predictive toxicology in drug development. Chem. Biol. 11, 161–171 (2004)Google Scholar
  34. 34.
    Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999)CrossRefGoogle Scholar
  35. 35.
    Tavazoie, S., et al.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)CrossRefGoogle Scholar
  36. 36.
    Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Tibshirani, R., et al.: Estimating the Number of Clusters in a Dataset via the Gap Statistic. J. Royal Stat. Soc. B 63, 411–423 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Wang, H., et al.: Clustering by pattern similarity in large data sets. In: Proc. KDD 2002, pp. 394–405 (2002)Google Scholar
  39. 39.
    Xu, X., et al.: Mining shifting-and-scaling co-regulation patterns on gene expression profiles. In: Proc. ICDE 2006, pp. 89–98 (2006)Google Scholar
  40. 40.
    Zhang, S., Wang, K., Chen, B., Huang, X.: A New Framework for Co-clustering of Gene Expression Data. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds.) PRIB 2011. LNCS, vol. 7036, pp. 1–12. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  41. 41.
    Zhao, L., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3D microarray data. In: Proc. ACM SIGMOD, pp. 694–705 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Shuzhong Zhang
    • 1
  • Kun Wang
    • 3
  • Cody Ashby
    • 3
  • Bilian Chen
    • 2
  • Xiuzhen Huang
    • 3
  1. 1.University of MinnesotaMinneapolisUSA
  2. 2.Xiamen UniversityXiamenChina
  3. 3.Arkansas State UniversityJonesboroUSA

Personalised recommendations