Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely-used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions which we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. We introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
M. A. Reyna and U. Chitra—These authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
jActiveModules actually maximizes a normalized scan statistic \(\varGamma _{\text {norm}}(S)\). We show in the full paper [65] that maximizing \(\varGamma _{\text {norm}}(S)\) is equivalent to maximizing the unnormalized scan statistic \(\varGamma (S)\) when the data is generated from normal distributions.
- 2.
The scan statistic (2) is the maximization of a non-linear objective function, but for fixed subnetwork size |S| the objective function is linear. We computed the scan statistic by modifying the ILP in heinz [24] to find a subnetwork of a fixed size, and running this ILP over all possible subnetwork sizes.
References
Addario-Berry, L., Broutin, N., Devroye, L., Lugosi, G., et al.: On combinatorial testing problems. Ann. Stat. 38(5), 3063–3092 (2010)
Amgalan, B., Lee, H.: WMAXC: a weighted maximum clique method for identifying condition-specific sub-network. PLoS ONE 9(8), e104993 (2014)
Arias-Castro, E., Candès, E.J., Durand, A.: Detection of an anomalous cluster in a network. Ann. Stat. 39(1), 278–304 (2011)
Arias-Castro, E., Candès, E.J., Helgason, H., Zeitouni, O.: Searching for a trail of evidence in a maze. Ann. Stat. 36(4), 1726–1757 (2008)
Arias-Castro, E., et al.: Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Stat. 34(1), 326–349 (2006)
Arias-Castro, E., et al.: Distribution-free detection of structured anomalies: permutation and rank-based scans. J. Am. Stat. Assoc. 113(522), 789–801 (2018)
Ayati, M., et al.: MOBAS: identification of disease-associated protein subnetworks using modularity-based scoring. EURASIP J. Bioinform. Syst. Biol. 2015, 7 (2015)
Batra, R., Alcaraz, N., Gitzhofer, K., et al.: On the performance of de novo pathway enrichment. NPJ Syst. Biol. Appl. 3(1), 6 (2017)
Berger, B., et al.: Computational solutions for omics data. Nat. Rev. Genet. 14(5), 333 (2013)
Bhalla, U.S., Iyengar, R.: Emergent properties of networks of biological signaling pathways. Science 283(5400), 381–387 (1999)
Califano, A., et al.: Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat. Genet. 44(8), 841–847 (2012)
Chasman, D., Siahpirani, A.F., Roy, S.: Network-based approaches for analysis of complex biological systems. Curr. Opin. Biotechnol. 39, 157–166 (2016)
Chen, J.: Consistency of the MLE under mixture models. Statist. Sci. 32(1), 47–63 (2017)
Cho, A., et al.: MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol. 17(1), 129 (2016)
Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Comput. Biol. 8(12), 1–11 (2012)
Choi, J., Shooshtari, P., Samocha, K.E., Daly, M.J., Cotsapas, C.: Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 12(6), e1006121 (2016)
Chua, H.N., et al.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)
Cowen, L., Ideker, T., Raphael, B.J., Sharan, R.: Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18(9), 551–562 (2017)
Das, J., Yu, H.: HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6(1), 92 (2012)
Daskalakis, C., et al.: Ten steps of EM suffice for mixtures of two Gaussians. In: Proceedings of the 2017 Conference on Learning Theory, pp. 704–710 (2017)
Dempster, A.P., et al.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
Deng, M., et al.: Prediction of protein function using protein-protein interaction data. J. Comput. Biol. 10(6), 947–960 (2003)
Dimitrakopoulos, C.M., Beerenwinkel, N.: Computational approaches for the identification of cancer genes and pathways. Wiley Interdisc. Rev. Syst. Biol. Med. 9(1), e1364 (2017)
Dittrich, M.T., et al.: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13), i223–i231 (2008)
de la Fuente, A.: From ‘differential expression’ to ‘differential networking’ - identification of dysfunctional regulatory networks in diseases. Trends Genet. 26(7), 326–333 (2010)
Glaz, J., Naus, J., Wallenstein, S.: Scan Statistics. Springer, New York (2001). https://doi.org/10.1007/978-1-4757-3460-7
Gligorijević, V., Pržulj, N.: Methods for biological data integration: perspectives and challenges. J. R. Soc. Interface 12(112), 20150571 (2015)
Gulsuner, S., et al.: Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154(3), 518–529 (2013)
Guo, M., et al.: SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acid Res. 45(7), e54 (2016)
Guo, Z., et al.: Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics 23(16), 2121–2128 (2007)
Halldórsson, B.V., Sharan, R.: Network-based interpretation of genomic variation data. J. Mol. Biol. 425(21), 3964–3969 (2013)
He, H., Lin, D., Zhang, J., Wang, Y., Deng, H.W.: Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network. BMC Bioinformatics 18(1), 149 (2017)
Head, M.L., Holman, L., Lanfear, R., Kahn, A.T., Jennions, M.D.: The extent and consequences of P-Hacking in science. PLoS Biol. 13(3), e1002106 (2015)
Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Methods 10(11), 1108–1115 (2013)
Hormozdiari, F., et al.: The discovery of integrated gene networks for autism and related disorders. Genome Res. 25(1), 142–154 (2015)
Horn, H., Lawrence, M.S., et al.: NetSig: network-based discovery from cancer genomes. Nat. Methods 15(1), 61–66 (2017)
Huang, J.K., Carlin, D.E., et al.: Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6(4), 484–495 (2018)
Hung, H.M.J., O’Neill, R.T., Bauer, P., Kohne, K.: The behavior of the P-value when the alternative hypothesis is true. Biometrics 53(1), 11–22 (1997)
Hung, J.H., et al.: Gene set enrichment analysis: performance evaluation and usage guidelines. Brief. Bioinform. 13(3), 281–291 (2011)
Ideker, T., et al.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl 1), S233–S240 (2002)
Ioannidis, J.P.: Why most published research findings are false. PLoS Med. 2(8), e124 (2005)
Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acid Res. 32(suppl 2), W83–W88 (2004)
Kim, M., Hwang, D.: Network-based protein biomarker discovery platforms. Genomics Inform. 14(1), 2 (2016)
Klimm, F., et al.: Functional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. bioRxiv (2019)
Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theor. Methods 26(6), 1481–1496 (1997)
Lee, I., et al.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109–1121 (2011)
Leiserson, M.D., Eldridge, J.V., Ramachandran, S., Raphael, B.J.: Network analysis of GWAS data. Curr. Opin. Genet. Dev. 23(6), 602–610 (2013)
Leiserson, M.D., et al.: Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47(2), 106–114 (2015)
Liu, J.J., Sharma, K., Zangrandi, L., et al.: In vivo brain GPCR signaling elucidated by phosphoproteomics. Science 360(6395) (2018)
Lu, X., Bressan, S.: Sampling connected induced subgraphs uniformly at random. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 195–212. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31235-9_13
Luo, Y., Zhao, X., et al.: A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8(1), 573 (2017)
McLachlan, G., et al.: A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22(13), 1608–1615 (2006)
Menche, J., et al.: Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601–1257601 (2015)
Mitra, K., et al.: Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14, 719 (2013)
Mutation Consequences and Pathway Analysis Working Group of the International Cancer Genome Consortium, et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12, 615 (2015)
Nabieva, E., et al.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, i302–i310 (2005)
Nibbe, R.K., Koyutürk, M., Chance, M.R.: An integrative-omics approach to identify functional sub-networks in human colorectal cancer. PLoS Comput. Biol. 6(1), e1000639 (2010)
Nikolayeva, I., Pla, O.G., Schwikowski, B.: Network module identification-a widespread theoretical bias and best practices. Methods 132, 19–25 (2018)
Nuzzo, R.: How scientists fool themselves-and how they can stop. Nat. News 526(7572), 182 (2015)
Pan, W., et al.: A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics 3(3), 117–124 (2003). https://doi.org/10.1007/s10142-003-0085-7
Petryszak, R., et al.: Expression atlas update: an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44(D1), D746–D752 (2015)
Pounds, S., Cheng, C.: Improving false discovery rate estimation. Bioinformatics 20(11), 1737–1745 (2004)
Pounds, S., Morris, S.W.: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19(10), 1236–1242 (2003)
Radivojac, P., Clark, W.T., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221 (2013)
Reyna, M.A., Chitra, U., et al.: Netmix: a network-structured mixture model for reduced-bias estimation of altered subnetworks. bioRxiv (2020). https://www.biorxiv.org/content/early/2020/01/19/2020.01.18.911438
Rolland, T., et al.: A proteome-scale map of the human interactome network. Cell 159(5), 1212–1226 (2014)
Roy, S., Ernst, J.O.: Identification of functional elements and regulatory circuits by drosophila modencode. Science 330(6012), 1787–1797 (2010)
Shannon, P., et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3(1), 88 (2007)
Sharpnack, J., Singh, A.: Near-optimal and computationally efficient detectors for weak and sparse graph-structured patterns. In: IEEE GlobalSIP (2013)
Sharpnack, J., Singh, A., Rinaldo, A.: Changepoint detection over graphs with the spectral scan statistic. In: Artificial Intelligence and Statistics, pp. 545–553 (2013)
Sharpnack, J., et al.: Detecting anomalous activity on networks with the graph Fourier scan statistic. IEEE Trans. Signal Process. 64(2), 364–379 (2016)
Sharpnack, J.L., et al.: Near-optimal anomaly detection in graphs using Lovasz extended scan statistic. In: Advance Neural Information Processing Systems (2013)
Shrestha, R., Hodzic, E., et al.: Hit’ndrive: patient-specific multidriver gene prioritization for precision oncology. Genome Res. 27(9), 1573–1588 (2017)
Soul, J., et al.: PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci. Rep. 5, 8117 (2015)
Vandin, F., Upfal, E., Raphael, B.J.: Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18(3), 507–522 (2011)
Vanunu, O., et al.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)
Wang, X., et al.: HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics 27(6), 879–880 (2011)
Wang, Y.H., Bower, N.I., et al.: Gene expression patterns during intramuscular fat development in cattle. J. Anim. Sci. 87(1), 119–130 (2009)
Xia, J., et al.: Networkanalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat. Protoc. 10, 823 (2015)
Xu, J., Hsu, D., Maleki, A.: Global analysis of expectation maximization for mixtures of two gaussians. In: Advances in Neural Information Processing (2016)
Acknowledgments
We thank Mohammed El-Kebir for assistance with implementing jActiveModules* by modifying the ILP in heinz. We thank David Tse for directing us to the network anomaly literature. M.A.R. was supported in part by the National Cancer Institute of the NIH (Cancer Target Discovery and Development Network grant U01CA217875). B.J.R. was supported by US National Institutes of Health (NIH) grants R01HG007069 and U24CA211000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Reyna, M.A., Chitra, U., Elyanow, R., Raphael, B.J. (2020). NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-45257-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45256-8
Online ISBN: 978-3-030-45257-5
eBook Packages: Computer ScienceComputer Science (R0)