Abstract
We survey key concepts of weighted gene coexpression network analysis (WGCNA), also known as weighted correlation network analysis, and related data analysis strategies. We describe the construction of a weighted gene coexpression network from gene expression data, identification of network modules and integration of external data such as gene ontology information and clinical phenotype data. We review Differential Weighted Gene Coexpression Network Analysis (DWGCNA), a method for comparing and contrasting networks constructed from qualitatively different groups of samples. DWGCNA provides a means for measuring not only differential expression but also differential connectivity. Further, we show how to incorporate genetic marker data with expression data via Integrated Weighted Gene Coexpression Network Analysis (IWGCNA). Lastly, we describe R software implementing WGCNA methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albert, R., & Barabasi, A. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.
Albert, R., Jeong, H., & Barabasi, A. L. (2000). Error and attack tolerance of complex networks. Nature, 406(6794), 378–382.
Aten, J., Fuller, T., Lusis, A., & Horvath, S. (2008). Using genetic markers to orient the edges in quantitative trait networks: The neo software. BMC Systems Biology, 2(1), 34. DOI10.1186/1752-0509-2-34.
Butte, A., Tamayo, P., Slonim, D., Golub, T., & Kohane, I. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences of the United States of America, 97, 12182–12186.
Cabusora, L., Sutton, E., Fulmer, A., & Forst, C. (2005). Differential network expression during drug and stress response. Bioinformatics, 21(12), 2898–2905.
Carlson, M., Zhang, B., Fang, Z., Mischel, P., Horvath, S., & Nelson, S. F. (2006). Gene connectivity, function, and sequence conservation: Predictions from modular yeast co-expression networks. BMC Genomics, 7(7), 40.
Carter, S., Brechb, C., Griffin, M., & Bond, A. (2004). Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics, 20(14), 2242–2250.
Chaibub Neto, E., Ferrara, C. T., Attie, A. D., & Yandell, B. S. (2008). Inferring causal phenotype networks from segregating populations. Genetics, 179(2), 1089–1100. DOI10.1534/genetics.107.085167.
Chuang, H., Lee, E., Liu, Y., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3(3), 140.
Clayton, D., & McKeigue, P. M. (2001). Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet, 358, 1356–1360.
Dennis, G. J., Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C., & Lempicki, R. A. (2003). David: Database for annotation, visualization, and integrated discovery. Genome Biology, 4(5), P3.
Dong, J., & Horvath, S. (2007). Understanding network concepts in modules. BMC Systems Biology, 1(1), 24.
Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95(25), 14863–14868.
Fisher, R. A. (1915). On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron, 1, 1–32.
Frohlich, H., Speer, N., Poustka, A., & BeiSZbarth, T. (2007). Gosim – an r-package for computation of information theoretic go similarities between terms and gene products. BMC Bioinformatics, 8(1), 166. DOI10.1186/1471-2105-8-166.
Fuller, T. F., Ghazalpour, A., Aten, J. E., Drake, T. A., Lusis, A. J., & Horvath, S. (2007). Weighted gene coexpression network analysis strategies applied to mouse weight. Mammalian Genome, 18(6–7), 463–472. DOI10.1007/s00335-007-9043-3.
Gargalovic, P., Imura, M., Zhang, B., Gharavi, N., Clark, M., Pagnon, J., Yang, W., He, A., Truong, A., Patel, S., Nelson, S., Horvath, S., Berliner, J., Kirchgessner, T., & Lusis, A. (2006). Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proceedings of the National Academy of Sciences, 103(34), 12741–12746.
Gentleman, R., Huber, W., Carey, V., Irizarry, R., & Dudoit, S. (2005). Bioinformatics and computational biology solutions using R and bioconductor.. New York: Springer-Verlag.
Ghazalpour, A., Doss, S., Zhang, B., Wang, S., Plaisier, C., Castellanos, R., Brozell, A., Schadt, E. E., Drake, T. A., Lusis, A. J., & Horvath, S. (2006). Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genetics, 2(8), e130. DOI10.1371/journal.pgen.0020130.
Han, J., Bertin, N., Hao, T., Goldberg, D., Berriz, G., Zhang, L., Dupuy, D., Walhout, A., Cusick, M., Roth, F., & Vidal, M. (2004). Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 430(6995), 88–93.
Henegar, C., Clement, K., & Zucker, J. D. (2006). Unsupervised multiple-instance learning for functional profiling of genomic data. In J. Fuernkranz, T. Scheffer, & M. Spiliopoulou (Eds.), Machine learning: ECML 2006 (pp. 186–197). Berlin: Springer. DOI 10.1007/11871842.
Horvath, S., & Dong, J. (2008). Geometric interpretation of gene coexpression network analysis. PLoS Computational Biology, 4(8), e1000, 117. DOI10.1371/journal.pcbi.1000117.
Horvath, S., Zhang, B., Carlson, M., Lu, K., Zhu, S., Felciano, R., Laurance, M., Zhao, W., Shu, Q., Lee, Y., Scheck, A., Liau, L., Wu, H., Geschwind, D., Febbo, P., Kornblum, H., Cloughesy, T. F., Nelson, S., & Mischel, P. (2006). Analysis of oncogenic signaling networks in glioblastoma identifies aspm as a novel molecular target. Proceedings of the National Academy of Sciences of the United States of America, 103(46), 17402–17407.
Hu, Z., Mellor, J., Wu, J., & DeLisi, C. (2004). Visant: An online visualization and analysis tool for biological interaction data. BMC Bioinformatics, 5(1), Article 17.
Jeong, H., Mason, S., Barabasi, A., & Oltvai, Z. (2001). Lethality and centrality in protein networks. Nature, 411, 41.
Katan, M. (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet, i, 507–508.
Kaufman, L., & Rousseeuw, P. (1990). Finding rroups in data: An introduction to cluster analysis. New York: Wiley.
Keller, M. P., Choi, Y., Wang, P., Belt Davis, D., Rabaglia, M. E., Oler, A. T., Stapleton, D. S., Argmann, C., Schueler, K. L., Edwards, S., Steinberg, H. A., Chaibub Neto, E., Kleinhanz, R., Turner, S., Hellerstein, M. K., Schadt, E. E., Yandell, B. S., Kendziorski, C., & Attie, A. D. (2008). A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Research, 18(5), 706–716. DOI10.1101/gr.074914.107.
Langfelder, P., & Horvath, S. (2007). Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology, 1, 54. DOI10.1186/1752-0509-1-54.
Langfelder, P., & Horvath, S. (2008). Wgcna: An r package for weighted correlation network analysis. BMC Bioinformatics, 9(1), 559.
Langfelder, P., Zhang, B., & Horvath, S. (2007). Defining clusters from a hierarchical cluster tree: The dynamic tree cut library for R. Bioinformatics, 24(5), 719–720.
Li, A., & Horvath, S. (2007). Network neighborhood analysis with the multi-node topological overlap measure. Bioinformatics, 23(2), 222–231.
Little, J., & Khoury, M. J. (2003). Mendelian randomisation: A new spin or real progress? Lancet, 362, 930–931.
Liu, M., Liberzon, A., Kong, S. W., Lai, W. R., Park, P. J., Kohane, I. S., & Kasif, S. (2007). Network-based analysis of. affected biological processes in type 2 diabetes models. PLoS Genetics, 3(6), e96. DOI10.1371/journal.pgen.0030096.
Oldham, M., Horvath, S., & Geschwind, D. (2006). Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proceedings of the National Academy of Sciences of the United States of America, 103(47), 17973–17978.
Oldham, M. C., Konopka, G., Iwamoto, K., Langfelder, P., Kato, T., Horvath, S., & Geschwind, D. H. (2008). Functional organization of the transcriptome in human brain. Nature Neuroscience, 11(11), 1271–1282.
Opgen-Rhein, R., & Strimmer, K. (2007). From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology, 1(37).
Presson, A. P., Sobel, E. M., Papp, J. C., Suarez, C. J., Whistler, T., Rajeevan, M. S., Vernon, S. D., & Horvath, S. (2008). Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC Systems Biology, 2, 95. DOI10.1186/1752-0509-2-95.
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabasi, A. L. (2002). Hierarchical organization of modularity in metabolic networks. Science, 297(5586), 1551–1555.
Reeves, W., Wagner, D., Nisenbaum, R., Jones, J., Gurbaxani, B., Solomon, L., Papanicolaou, D., Unger, E., Vernon, S., & Heim, C. (2005). Chronic fatigue syndrome-a clinically empirical approach to its definition and study. BMC Medicine, 3(19).
Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards, S., GuhaThakurta, D., Sieberts, S. K., Monks, S., Reitman, M., Zhang, C., Lum, P. Y., Leonardson, A., Thieringer, R., Metzger, J. M., Yang, L., Castle, J., Zhu, H., Kash, S. F., Drake, T. A., Sachs, A., & Lusis, A. J. (2005). An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genetics, 37(7), 710–717.
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., & Ideker, T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. DOI10.1101/gr.1239303.
Shen, R., Ghosh, D., Chinnaiyan, A., & Meng, Z. (2006). Eigengene-based linear discriminant model for tumor classification using gene expression microarray data. Bioinformatics, 22(21), 2635–2642.
Smith, G. D. (2006). Randomized by (your) god: Robust inference from an observational study design. Journal of Epidemiology & Community Health, 60, 382–388.
Smith, G. D., & Ebrahim, S. (2003). ‘Mendelian randomization’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32, 1–22.
Steffen, M., Petti, A., Aach, J., D’haeseleer, P., & Church, G. (2002). Automated modelling of signal transduction networks. BMC Bioinformatics, 3(1), 34.
Stuart, J. M., Segal, E., Koller, D., & Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643), 249–255.
vanNas, A., Guhathakurta, D., Wang, S., Yehya, S., Horvath, S., Zhang, B., IngramDrake, L., Chaudhuri, G., Schadt, E., Drake, T., Arnold, A., & Lusis, A. (2008). Elucidating the role of gonadal hormones in sexually dimorphic gene co-expression networks. Endocrinology, 3(150), 1235–1249.
Voy, B. H., Scharff, J. A., Perkins, A. D., Saxton, A. M., Borate, B., Chesler, E. J., Branstetter, L. K., & Langston, M. A. (2006). Extracting gene networks for low-dose radiation using graph theoretical algorithms. PLoS Computational Biology, 2(7), e89.
Wei, H., Persson, S., Mehta, T., Srinivasasainagendra, V., Chen, L., Page, G., Somerville, C., & Loraine, A. (2006). Transcriptional coordination of the metabolic network in arabidopsis. Plant Physiology, 142(2), 762–774.
Weston, D., Gunter, L., Rogers, A., & Wullschleger, S. (2008). Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants. BMC Systems Biology, 2(1), 16. DOI10.1186/1752-0509-2-16.
Wilcox, R. R. (2004). Introduction to robust estimation and hypothesis testing. Academic Press. ISBN:0127515429.
Yip, A., & Horvath, S. (2007). Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics, 8(8), 22.
Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology4(1), 17.
Zhou, X., Kao, M., & Wong, W. (2002). Transitive functional annotation by shortest path analysis of gene expression data. Proceedings of the National Academy of Sciences of the United States of America, 99(20), 12783–12788.
Acknowledgements
We would like to acknowledge the grant support from 1U19AI063603-01, 5P30CA016042-28, P50CA092131, and DK072206. The authors would like to thank UCLA collaborators Jun Dong, Jake Lusis, Tom Drake, Dan Geschwind, Wen Lin, Paul Mischel, Mike Oldham, and Wei Zhao for useful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fuller, T., Langfelder, P., Presson, A., Horvath, S. (2011). Review of Weighted Gene Coexpression Network Analysis. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-16345-6_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)