Building Networks with Microarray Data

  • Bradley M. Broom
  • Waree Rinsurongkawong
  • Lajos Pusztai
  • Kim-Anh Do
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


This chapter describes methods for learning gene interaction networks from high-throughput gene expression data sets. Many genes have unknown or poorly understood functions and interactions, especially in diseases such as cancer where the genome is frequently mutated. The gene interactions inferred by learning a network model from the data can form the basis of hypotheses that can be verified by subsequent biological experiments. This chapter focuses specifically on Bayesian network models, which have a level of mathematical detail greater than purely conceptual models but less than detailed differential equation models. From a network learning perspective the most severe problem with microarray data is the limited sample size, since there are usually many plausible networks for modeling the system. Since these cannot be reliably distinguished using the number of samples found in current microarray data sets, we describe robust network learning strategies for reducing the number of false interactions detected. We perform preliminary clustering using co-expression network analysis and gene shaving. Subsequently we construct Bayesian networks to obtain a global perspective of the relationships between these gene clusters. Throughout this chapter, we illustrate the concepts being expounded by referring to an ongoing example of a publicly available breast cancer data set.

Key words

Bayesian network co-expression network microarray cancer scale-free topology gene modules gene shaving bagging bayesian bootstrap 



Kim-Anh Do was partially funded by the National Institutes of Health via the University of Texas SPORE in Breast Cancer (CA-116199) and the Cancer Center Support Grant (CA016672).


  1. 1.
    Schena, M., Shalon, D., Davis, R., and Brown, P. (October 1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235), 467–470.PubMedCrossRefGoogle Scholar
  2. 2.
    Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25), 14863–14868.CrossRefGoogle Scholar
  3. 3.
    Pelloski, C. E., Mahajan, A., Maor, M., Chang, E. L., Woo, S., Gilbert, M., Colman, H., Yang, H., Ledoux, A., Blair, H., Passe, S., Jenkins, R. B., and Aldape, K. D. (2005) YKL-40 expression is associated with poorer response to radiation and shorter overall survival in glioblastoma. Clinical Cancer Research 11(9), 3326–3334.PubMedCrossRefGoogle Scholar
  4. 4.
    Airoldi, E. M. (December 2007) Getting started in probabilistic graphical models. PLoS Comput Biol 3(12), e252.PubMedCrossRefGoogle Scholar
  5. 5.
    Ideker, T., and Lauffenberger, D. (2003) Building with a scaffold: emerging strategies for high to low-level cellular modeling. Trends in Biotechnology 21(6).Google Scholar
  6. 6.
    Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems. Morgan Kauffman, San Francisco, CA.Google Scholar
  7. 7.
    Baggerly, K. A., Coombes, K. R., and Neeley, E. S. (2008) Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. Journal of Clinical Oncology 26(7), 1186–1187.PubMedCrossRefGoogle Scholar
  8. 8.
    Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A. M., Gillet, C., Ellis, P., Harris, A., Bergh, J., Foekens, J. A., Klijn, J. G., Larsimont, D., Buyse, M., Bontempi, G., Delorenzi, M., Piccart, M. J., and Sotiriou, C. (2007) Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. Journal of Clinical Oncology 25(10), 1239–1246.PubMedCrossRefGoogle Scholar
  9. 9.
    Xu, X., Wang, L., and Ding, D. (December 1984) Learning module networks from genome-wide location and expression data. FEBS Letters 578(3), 297–304.CrossRefGoogle Scholar
  10. 10.
    Zhang, B., and Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology 4(1). Article 17.Google Scholar
  11. 11.
    Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, N. Z., and Barabasi, A. L. (August 2002) Hierarchical organization of modularity in metabolic networks. Science 297, 1151–1155.CrossRefGoogle Scholar
  12. 12.
    Hastie, T., Tibshirani, R., Eisen, M. B., Alizadeh, A., Levy, R., Staudt, L., Chan, W. C., Botstein, D., and Brown, P. (2000) ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2).Google Scholar
  13. 13.
    Breiman, L. (1996) Bagging predictors. Machine Learning 24(2), 123–140.Google Scholar
  14. 14.
    Do, K.-A., Broom, B. M., and Wen, S. (2003) Geneclust. In Parmigiani, G., Garrett, E. S., Irizarry, R. A., and Zeger, S. L., ed., The Analysis of Gene Expression Data: Methods and Software, chapter 15, p. 342–361. Springer, New York, NY.Google Scholar
  15. 15.
    Rubin, D. B. (January 1981) The bayesian bootstrap. The Annals of Statistics 9(1), 130–134.CrossRefGoogle Scholar
  16. 16.
    Beers, E. H. V., and Nederlof, P. M. (2006) Array-CGH and breast cancer. Breast Cancer Research. 8(3), 210.PubMedCrossRefGoogle Scholar
  17. 17.
    Bergamaschi, A., Kim, Y. H., Wang, P., Sørlie, T., Hernandez-Boussard, T., Lonning, P. E., Tibshirani, R., Børresen-Dale, A.-L., and Pollack, J. R. (November 2006) Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer 45(11), 1033–1040.PubMedCrossRefGoogle Scholar
  18. 18.
    Bayes, T. (1763) An essay towards solving a problem in the doctrine of chances. by the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philosophical Transactions 53, 370–418. Giving Some Account of the Present Undertakings, Studies and Labours of the Ingenious in Many Considerable Parts of the World.Google Scholar
  19. 19.
    Bayes, T. (1763/1958) Studies in the history of probability and statistics: IX. Thomas Bayes’ essay towards solving a problem in the doctrine of chances. Biometrika 45, 296–315. Bayes’ essay in modernized notation.Google Scholar
  20. 20.
    Chickering, D. M. (1996) Learning bayesian networks is NP-complete. In Fisher, D. H., and Lenz, H.-J., (ed.), Learning from Data: Artificial Intelligence and Statistics V, chapter 12, p. 121–130. Springer-verlag.Google Scholar
  21. 21.
    Friedman, N., and Koller, D. (2003) Being bayesian about network structure: A bayesian approach to structure discovery in bayesian networks. Machine Learning 50, 95–126.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Bradley M. Broom
    • 1
  • Waree Rinsurongkawong
    • 2
  • Lajos Pusztai
    • 3
  • Kim-Anh Do
    • 2
  1. 1.Department of Bioinformatics and Computational BiologyUniversity of Texas M. D. Anderson Cancer CenterHoustonUSA
  2. 2.Department of BiostatisticsUniversity of Texas M. D. Anderson Cancer CenterHoustonUSA
  3. 3.Department of Breast Medical OncologyUniversity of Texas M. D. Anderson Cancer CenterHoustonUSA

Personalised recommendations