Bi-clustering of Gene Expression Data Using Conditional Entropy

  • Afolabi Olomola
  • Sumeet Dua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)

Abstract

The inherent sparseness of gene expression data and the rare exhibition of similar expression patterns across a wide range of conditions make traditional clustering techniques unsuitable for gene expression analysis. Biclustering methods currently used to identify correlated gene patterns based on a subset of conditions do not effectively mine constant, coherent, or overlapping biclusters, partially because they perform poorly in the presence of noise. In this paper, we present a new methodology (BiEntropy) that combines information entropy and graph theory techniques to identify co-expressed gene patterns that are relevant to a subset of the sample. Our goal is to discover different types of biclusters in the presence of noise and to demonstrate the superiority of our method over existing methods in terms of discovering functionally enriched biclusters. We demonstrate the effectiveness of our method using both synthetic and real data.

Keywords

Gene expression biclustering conditional entropy 

References

  1. 1.
    Hartigan, J.: Direct Clustering of a Data Matrix. J. Am. Statistical Assoc. 67, 123–129 (1972)CrossRefGoogle Scholar
  2. 2.
    Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of Intelligent Systems for Molecular Biology (2000)Google Scholar
  3. 3.
    Kupiec, M., Shamir, R., Tanay, A., Sharan, R.: Revealing Modularity and Organization in the Yeast Molecular Network by Integrated Analysis of Highly Heterogeneous Genome-Wide Data. PNAS 101, 2981–2986 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Karp, R., Ben-Dor, A., Chor, B., Yakhini, Z.: Discovering Local Structure in Gene Expression Data: The Order-Preserving Sub Matrix Problem. In: Proceedings of the 6th Int. Conf. on Computational Molecular Biology (RECOMB), pp. 49–57 (2002)Google Scholar
  5. 5.
    Bergmann, S., Ihmels, J., Barkai, N.: Defining Transcription Modules Using Large-Scale Gene Expression Data. Bioinformatics 20, 2003–2004 (1993)Google Scholar
  6. 6.
    Murali, T.M., Kasif, S.: Extracting Conserved Gene Expression Motifs from Gene Expression Data. In: Proceedings of the 8th Pacific Symposium on BiocomputingGoogle Scholar
  7. 7.
    Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E., Prelic, A., Bleuler, S.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics (2006)Google Scholar
  8. 8.
    Gasch, A.P.: Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell 11, 4241–4257 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Berriz, G., Bryant, O., Sander, C., Roth, F.: Charactering Gene Sets with FuncAssociate. Bioinformatics 22, 1282–1283 (2003)Google Scholar
  10. 10.
    Prelic, A., Zimmermann, P., Barkow, S., Bleuler, S., Zitzler, E.: Bicat: A Biclustering Analysis Toolbox. Bioinformatics 22, 1282–1283 (2006)CrossRefPubMedGoogle Scholar
  11. 11.
    Maron-Katz, A., Sharan, R., Shamir, R.: Click and Expander: A System for Clustering and Visualizing Gene Expression Data. Bioinformatics 19, 1787–1799 (2003)CrossRefPubMedGoogle Scholar
  12. 12.
    GeneOntology Consortium, http://www.geneontology.org
  13. 13.
    Bron, C., Kerbosch, J.: Algorithm 457: Finding All Cliques of an Undirected Graph. ACM Comm. 16 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Afolabi Olomola
    • 1
  • Sumeet Dua
    • 1
    • 2
  1. 1.Data Mining Research Laboratory (DMRL), Department of Computer ScienceLouisiana Tech UniversityRustonU.S.A.
  2. 2.School of MedicineLouisiana State University Health SciencesNew OrleansU.S.A.

Personalised recommendations