Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference
Simultaneous measurement of gene expression level for thousands of genes contains the rich information about many different aspects of biological mechanisms. A major computational challenge is to find methods to extract new biological insights from this wealth of data. Complex biological processes are often regulated under the various conditions or circumstances and associated gene interactions are dynamically changed depending on different biological contexts. Thus, inference of such dynamic relationships between genes with consideration of biological conditions is very challenging.
In this study, we propose a comprehensive and integrated approach to infer the dynamic relationships between genes and evaluate this approach on three distinct gene networks.
This study demonstrates the advantage of integrating Markov chain Monte Carlo (MCMC) simulation into a Bayesian mixture model to overcome the high-dimension, low sample size (HDLSS) problem as well as to identify context-specific biological modules. Such biological modules were identified through the summarization of sampled network structures obtained from MCMC simulation.
This novel approach gives a comprehensive understanding of the dynamically regulated biological modules.
KeywordsMarkov chain Monte Carlo Bayesian mixture model Gene network
This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education Grant 2017R1D1A1B03032457 and Hankuk University of Foreign Studies Research Fund (to Y.K.), and the Ministry of Science and ICT of Korea Grant 2014M3C9A3063544 and the Ministry of Education of Korea Grant 2016R1D1A1B03930209 (to J.K.).
Compliance with ethical standards
Conflict of interest
Younhee Ko, Jaebum Kim, and Sandra L. Rodriguez-Zas declare that they have no conflict of interest.
This article does not contain any studies with human subjects or animals performed by any of the authors.
- Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499Google Scholar
- Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data (SIGMOD), pp 207–216Google Scholar
- Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA et al (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21:1337–1342Google Scholar
- Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17:1490–1504Google Scholar
- Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Proceedings of pacific symposium on biocomputing (PSB), vol 5, pp 415–426Google Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868Google Scholar
- Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620Google Scholar
- Grzegorczyk M, Husmeier D (2008) Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach Learn 71:265–305Google Scholar
- Guo S, Jiang Q, Chen L, Guo D (2016) Gene regulatory network inference using PLS-based methods. BMC Bioinform 17:545Google Scholar
- Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109Google Scholar
- Husmeier D, Werhli AV (2007) Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. Comput Syst Bioinform Conf 6:85–95Google Scholar
- Husmeier D, Dybowski R, Roberts S (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, New YorkGoogle Scholar
- Imoto S, Tamada Y, Araki H, Yasuda K, Print CG, Charnock-Jones SD, Sanders D, Savoie CJ, Tashiro K, Kuhara S et al (2006) Computational strategy for discovering druggable gene networks from genome-wide RNA expression profiles. Pac Symp Biocomput:559–571Google Scholar
- Ishida T, Schatz GC (1998) Monte Carlo sampling methods for determining potential energy surfaces using Shepard interpolation. The O(D-1) + H-2 system. Chem Phys Lett 298:285–292Google Scholar
- Ko Y, Zhai C, Rodriguez-Zas S (2007) Inference of gene pathways using Gaussian mixture models. In: Proceedings of 2007 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 362–367Google Scholar
- Kuffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R (2012) Inferring gene regulatory networks by ANOVA. Bioinformatics 28:1376–1382Google Scholar
- Lemmens K, De Bie T, Dhollander T, De Keersmaecker SC, Thijs IM, Schoofs G, De Weerdt A, De Moor B, Vanderleyden J, Collado-Vides J et al (2009) DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol 10:R27PubMedCentralGoogle Scholar
- Madigan D, York J (1995) Bayesian graphical models for discrete-data. Int Stat Rev 63:215–232Google Scholar
- Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092Google Scholar
- Mordelet F, Vert JP (2008) SIRENE: supervised inference of regulatory networks. Bioinformatics 24:i76–i82Google Scholar
- Nariai N, Kim S, Imoto S, Miyano S (2004) Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks. Pac Symp Biocomput:336–347Google Scholar
- Nir Friedman DK (2003) Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50:95–125Google Scholar
- Paolo Giudici RC (2003) Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50:127–158Google Scholar
- Reiss DJ, Baliga NS, Bonneau R (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform 7:280Google Scholar
- Riffle M, Malmstrom L, Davis TN (2005) The yeast resource center public data repository. Nucleic Acids Res 33:D378–D382Google Scholar
- Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S (2003) Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics 19(Suppl 2):ii227-236Google Scholar
- Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6:Article15Google Scholar
- Werhli AV, Husmeier D (2008) Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. J Bioinform Comput Biol 6:543–572Google Scholar