A New Framework for Co-clustering of Gene Expression Data

  • Shuzhong Zhang
  • Kun Wang
  • Bilian Chen
  • Xiuzhen Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)


A new framework is proposed to study the co-clustering of gene expression data. This framework is based on a generic tensor optimization model and an optimization method termed Maximum Block Improvement (MBI) recently developed in [3]. Not only can this framework be applied for co-clustering gene expression data with genes expressed at different conditions represented in 2D matrices, but it can also be readily applied for co-clustering more complex high-dimensional gene expression data with genes expressed at different tissues, different development stages, different time points, different stimulations, etc. Moreover, the new framework is so flexible that it poses no difficulty at all to incorporate a variety of clustering quality measurements. In this paper, we demonstrate the effectiveness of this new approach by providing the details of one specific implementation of the algorithm, and presenting the experimental testing on microarray gene expression datasets. Our results show that the new algorithm is very efficient and it performs well for identifying patterns in gene expression datasets.


Gene Expression Data Gene Expression Dataset Yeast Cell Cycle Assignment Matrix Yeast Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aguilar-Ruiz, J.S.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)CrossRefGoogle Scholar
  2. 2.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)CrossRefGoogle Scholar
  3. 3.
    Chen, B., He, S., Li, Z., Zhang, S.: Maximum block improvement and polynomial optimization (submitted for publication, 2011)Google Scholar
  4. 4.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 93–103 (2000)Google Scholar
  5. 5.
    Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of The fourth SIAM International Conference on Data Mining, pp. 114–125 (2004)Google Scholar
  6. 6.
    Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)CrossRefGoogle Scholar
  7. 7.
    Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Sanden, S.V., Lin, D., Talloen, W., Bijnens, L., Ghlmann, H.W.H., Shkedy, Z., Clevert, D.: FABIA: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)CrossRefGoogle Scholar
  8. 8.
    Kilian, J., Whitehead, D., Horak, J., Wanke, D., Weinl, S., Batistic, O., D’Angelo, C., Bornberg-Bauer, E., Kudla, J., Harter, K.: The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. The Plant Journal 2, 347–363 (2007)CrossRefGoogle Scholar
  9. 9.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)CrossRefzbMATHGoogle Scholar
  10. 10.
    Jegelka, S., Sra, S., Banerjee, A.: Approximation algorithms for tensor clustering. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS, vol. 5809, pp. 368–383. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004)CrossRefGoogle Scholar
  12. 12.
    Supper, J., Strauch, M., Wanke, D., Harter, K., Zell, A.: EDISA: extracting biclusters from multiple time-series of gene expression profiles. BMC Bioinformatics 8, 334–347 (2007)CrossRefGoogle Scholar
  13. 13.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.: Systematic determination of genetic network architecture. Nat. Genet. 22(3), 281–285 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Shuzhong Zhang
    • 1
  • Kun Wang
    • 2
  • Bilian Chen
    • 3
  • Xiuzhen Huang
    • 2
  1. 1.Industrial and Systems Engineering ProgramUniversity of MinnesotaMinneapolisUSA
  2. 2.Department of Computer ScienceArkansas State UniversityJonesboroUSA
  3. 3.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatinHong Kong

Personalised recommendations