Skip to main content

NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2020)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12074))

  • 1793 Accesses

Abstract

A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely-used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions which we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. We introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.

M. A. Reyna and U. Chitra—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    jActiveModules actually maximizes a normalized scan statistic \(\varGamma _{\text {norm}}(S)\). We show in the full paper [65] that maximizing \(\varGamma _{\text {norm}}(S)\) is equivalent to maximizing the unnormalized scan statistic \(\varGamma (S)\) when the data is generated from normal distributions.

  2. 2.

    The scan statistic (2) is the maximization of a non-linear objective function, but for fixed subnetwork size |S| the objective function is linear. We computed the scan statistic by modifying the ILP in heinz [24] to find a subnetwork of a fixed size, and running this ILP over all possible subnetwork sizes.

References

  1. Addario-Berry, L., Broutin, N., Devroye, L., Lugosi, G., et al.: On combinatorial testing problems. Ann. Stat. 38(5), 3063–3092 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amgalan, B., Lee, H.: WMAXC: a weighted maximum clique method for identifying condition-specific sub-network. PLoS ONE 9(8), e104993 (2014)

    Article  Google Scholar 

  3. Arias-Castro, E., Candès, E.J., Durand, A.: Detection of an anomalous cluster in a network. Ann. Stat. 39(1), 278–304 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Arias-Castro, E., Candès, E.J., Helgason, H., Zeitouni, O.: Searching for a trail of evidence in a maze. Ann. Stat. 36(4), 1726–1757 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Arias-Castro, E., et al.: Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Stat. 34(1), 326–349 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Arias-Castro, E., et al.: Distribution-free detection of structured anomalies: permutation and rank-based scans. J. Am. Stat. Assoc. 113(522), 789–801 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ayati, M., et al.: MOBAS: identification of disease-associated protein subnetworks using modularity-based scoring. EURASIP J. Bioinform. Syst. Biol. 2015, 7 (2015)

    Article  Google Scholar 

  8. Batra, R., Alcaraz, N., Gitzhofer, K., et al.: On the performance of de novo pathway enrichment. NPJ Syst. Biol. Appl. 3(1), 6 (2017)

    Article  Google Scholar 

  9. Berger, B., et al.: Computational solutions for omics data. Nat. Rev. Genet. 14(5), 333 (2013)

    Article  Google Scholar 

  10. Bhalla, U.S., Iyengar, R.: Emergent properties of networks of biological signaling pathways. Science 283(5400), 381–387 (1999)

    Article  Google Scholar 

  11. Califano, A., et al.: Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat. Genet. 44(8), 841–847 (2012)

    Article  Google Scholar 

  12. Chasman, D., Siahpirani, A.F., Roy, S.: Network-based approaches for analysis of complex biological systems. Curr. Opin. Biotechnol. 39, 157–166 (2016)

    Article  Google Scholar 

  13. Chen, J.: Consistency of the MLE under mixture models. Statist. Sci. 32(1), 47–63 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cho, A., et al.: MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol. 17(1), 129 (2016)

    Article  Google Scholar 

  15. Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Comput. Biol. 8(12), 1–11 (2012)

    Article  Google Scholar 

  16. Choi, J., Shooshtari, P., Samocha, K.E., Daly, M.J., Cotsapas, C.: Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 12(6), e1006121 (2016)

    Article  Google Scholar 

  17. Chua, H.N., et al.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)

    Article  Google Scholar 

  18. Cowen, L., Ideker, T., Raphael, B.J., Sharan, R.: Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18(9), 551–562 (2017)

    Article  Google Scholar 

  19. Das, J., Yu, H.: HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6(1), 92 (2012)

    Article  Google Scholar 

  20. Daskalakis, C., et al.: Ten steps of EM suffice for mixtures of two Gaussians. In: Proceedings of the 2017 Conference on Learning Theory, pp. 704–710 (2017)

    Google Scholar 

  21. Dempster, A.P., et al.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  22. Deng, M., et al.: Prediction of protein function using protein-protein interaction data. J. Comput. Biol. 10(6), 947–960 (2003)

    Article  Google Scholar 

  23. Dimitrakopoulos, C.M., Beerenwinkel, N.: Computational approaches for the identification of cancer genes and pathways. Wiley Interdisc. Rev. Syst. Biol. Med. 9(1), e1364 (2017)

    Article  Google Scholar 

  24. Dittrich, M.T., et al.: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13), i223–i231 (2008)

    Article  Google Scholar 

  25. de la Fuente, A.: From ‘differential expression’ to ‘differential networking’ - identification of dysfunctional regulatory networks in diseases. Trends Genet. 26(7), 326–333 (2010)

    Article  Google Scholar 

  26. Glaz, J., Naus, J., Wallenstein, S.: Scan Statistics. Springer, New York (2001). https://doi.org/10.1007/978-1-4757-3460-7

    Book  MATH  Google Scholar 

  27. Gligorijević, V., Pržulj, N.: Methods for biological data integration: perspectives and challenges. J. R. Soc. Interface 12(112), 20150571 (2015)

    Article  Google Scholar 

  28. Gulsuner, S., et al.: Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154(3), 518–529 (2013)

    Article  Google Scholar 

  29. Guo, M., et al.: SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acid Res. 45(7), e54 (2016)

    Google Scholar 

  30. Guo, Z., et al.: Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics 23(16), 2121–2128 (2007)

    Article  Google Scholar 

  31. Halldórsson, B.V., Sharan, R.: Network-based interpretation of genomic variation data. J. Mol. Biol. 425(21), 3964–3969 (2013)

    Article  Google Scholar 

  32. He, H., Lin, D., Zhang, J., Wang, Y., Deng, H.W.: Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network. BMC Bioinformatics 18(1), 149 (2017)

    Article  Google Scholar 

  33. Head, M.L., Holman, L., Lanfear, R., Kahn, A.T., Jennions, M.D.: The extent and consequences of P-Hacking in science. PLoS Biol. 13(3), e1002106 (2015)

    Article  Google Scholar 

  34. Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Methods 10(11), 1108–1115 (2013)

    Article  Google Scholar 

  35. Hormozdiari, F., et al.: The discovery of integrated gene networks for autism and related disorders. Genome Res. 25(1), 142–154 (2015)

    Article  Google Scholar 

  36. Horn, H., Lawrence, M.S., et al.: NetSig: network-based discovery from cancer genomes. Nat. Methods 15(1), 61–66 (2017)

    Article  Google Scholar 

  37. Huang, J.K., Carlin, D.E., et al.: Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6(4), 484–495 (2018)

    Article  Google Scholar 

  38. Hung, H.M.J., O’Neill, R.T., Bauer, P., Kohne, K.: The behavior of the P-value when the alternative hypothesis is true. Biometrics 53(1), 11–22 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hung, J.H., et al.: Gene set enrichment analysis: performance evaluation and usage guidelines. Brief. Bioinform. 13(3), 281–291 (2011)

    Article  Google Scholar 

  40. Ideker, T., et al.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl 1), S233–S240 (2002)

    Article  Google Scholar 

  41. Ioannidis, J.P.: Why most published research findings are false. PLoS Med. 2(8), e124 (2005)

    Article  Google Scholar 

  42. Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acid Res. 32(suppl 2), W83–W88 (2004)

    Article  Google Scholar 

  43. Kim, M., Hwang, D.: Network-based protein biomarker discovery platforms. Genomics Inform. 14(1), 2 (2016)

    Article  Google Scholar 

  44. Klimm, F., et al.: Functional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. bioRxiv (2019)

    Google Scholar 

  45. Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theor. Methods 26(6), 1481–1496 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  46. Lee, I., et al.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109–1121 (2011)

    Article  Google Scholar 

  47. Leiserson, M.D., Eldridge, J.V., Ramachandran, S., Raphael, B.J.: Network analysis of GWAS data. Curr. Opin. Genet. Dev. 23(6), 602–610 (2013)

    Article  Google Scholar 

  48. Leiserson, M.D., et al.: Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47(2), 106–114 (2015)

    Article  Google Scholar 

  49. Liu, J.J., Sharma, K., Zangrandi, L., et al.: In vivo brain GPCR signaling elucidated by phosphoproteomics. Science 360(6395) (2018)

    Google Scholar 

  50. Lu, X., Bressan, S.: Sampling connected induced subgraphs uniformly at random. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 195–212. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31235-9_13

    Chapter  Google Scholar 

  51. Luo, Y., Zhao, X., et al.: A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8(1), 573 (2017)

    Article  Google Scholar 

  52. McLachlan, G., et al.: A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22(13), 1608–1615 (2006)

    Article  Google Scholar 

  53. Menche, J., et al.: Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601–1257601 (2015)

    Article  Google Scholar 

  54. Mitra, K., et al.: Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14, 719 (2013)

    Article  Google Scholar 

  55. Mutation Consequences and Pathway Analysis Working Group of the International Cancer Genome Consortium, et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12, 615 (2015)

    Google Scholar 

  56. Nabieva, E., et al.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, i302–i310 (2005)

    Article  Google Scholar 

  57. Nibbe, R.K., Koyutürk, M., Chance, M.R.: An integrative-omics approach to identify functional sub-networks in human colorectal cancer. PLoS Comput. Biol. 6(1), e1000639 (2010)

    Article  Google Scholar 

  58. Nikolayeva, I., Pla, O.G., Schwikowski, B.: Network module identification-a widespread theoretical bias and best practices. Methods 132, 19–25 (2018)

    Article  Google Scholar 

  59. Nuzzo, R.: How scientists fool themselves-and how they can stop. Nat. News 526(7572), 182 (2015)

    Article  Google Scholar 

  60. Pan, W., et al.: A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics 3(3), 117–124 (2003). https://doi.org/10.1007/s10142-003-0085-7

    Article  MathSciNet  Google Scholar 

  61. Petryszak, R., et al.: Expression atlas update: an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44(D1), D746–D752 (2015)

    Article  Google Scholar 

  62. Pounds, S., Cheng, C.: Improving false discovery rate estimation. Bioinformatics 20(11), 1737–1745 (2004)

    Article  Google Scholar 

  63. Pounds, S., Morris, S.W.: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19(10), 1236–1242 (2003)

    Article  Google Scholar 

  64. Radivojac, P., Clark, W.T., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221 (2013)

    Article  Google Scholar 

  65. Reyna, M.A., Chitra, U., et al.: Netmix: a network-structured mixture model for reduced-bias estimation of altered subnetworks. bioRxiv (2020). https://www.biorxiv.org/content/early/2020/01/19/2020.01.18.911438

  66. Rolland, T., et al.: A proteome-scale map of the human interactome network. Cell 159(5), 1212–1226 (2014)

    Article  Google Scholar 

  67. Roy, S., Ernst, J.O.: Identification of functional elements and regulatory circuits by drosophila modencode. Science 330(6012), 1787–1797 (2010)

    Article  Google Scholar 

  68. Shannon, P., et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)

    Article  Google Scholar 

  69. Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3(1), 88 (2007)

    Article  Google Scholar 

  70. Sharpnack, J., Singh, A.: Near-optimal and computationally efficient detectors for weak and sparse graph-structured patterns. In: IEEE GlobalSIP (2013)

    Google Scholar 

  71. Sharpnack, J., Singh, A., Rinaldo, A.: Changepoint detection over graphs with the spectral scan statistic. In: Artificial Intelligence and Statistics, pp. 545–553 (2013)

    Google Scholar 

  72. Sharpnack, J., et al.: Detecting anomalous activity on networks with the graph Fourier scan statistic. IEEE Trans. Signal Process. 64(2), 364–379 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  73. Sharpnack, J.L., et al.: Near-optimal anomaly detection in graphs using Lovasz extended scan statistic. In: Advance Neural Information Processing Systems (2013)

    Google Scholar 

  74. Shrestha, R., Hodzic, E., et al.: Hit’ndrive: patient-specific multidriver gene prioritization for precision oncology. Genome Res. 27(9), 1573–1588 (2017)

    Article  Google Scholar 

  75. Soul, J., et al.: PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci. Rep. 5, 8117 (2015)

    Article  Google Scholar 

  76. Vandin, F., Upfal, E., Raphael, B.J.: Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18(3), 507–522 (2011)

    Article  MathSciNet  Google Scholar 

  77. Vanunu, O., et al.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)

    Article  MathSciNet  Google Scholar 

  78. Wang, X., et al.: HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics 27(6), 879–880 (2011)

    Article  Google Scholar 

  79. Wang, Y.H., Bower, N.I., et al.: Gene expression patterns during intramuscular fat development in cattle. J. Anim. Sci. 87(1), 119–130 (2009)

    Article  Google Scholar 

  80. Xia, J., et al.: Networkanalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat. Protoc. 10, 823 (2015)

    Article  Google Scholar 

  81. Xu, J., Hsu, D., Maleki, A.: Global analysis of expectation maximization for mixtures of two gaussians. In: Advances in Neural Information Processing (2016)

    Google Scholar 

Download references

Acknowledgments

We thank Mohammed El-Kebir for assistance with implementing jActiveModules* by modifying the ILP in heinz. We thank David Tse for directing us to the network anomaly literature. M.A.R. was supported in part by the National Cancer Institute of the NIH (Cancer Target Discovery and Development Network grant U01CA217875). B.J.R. was supported by US National Institutes of Health (NIH) grants R01HG007069 and U24CA211000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin J. Raphael .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2626 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reyna, M.A., Chitra, U., Elyanow, R., Raphael, B.J. (2020). NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45257-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45256-8

  • Online ISBN: 978-3-030-45257-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics