Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference

  • Younhee Ko
  • Jaebum KimEmail author
  • Sandra L. Rodriguez-ZasEmail author
Research Article



Simultaneous measurement of gene expression level for thousands of genes contains the rich information about many different aspects of biological mechanisms. A major computational challenge is to find methods to extract new biological insights from this wealth of data. Complex biological processes are often regulated under the various conditions or circumstances and associated gene interactions are dynamically changed depending on different biological contexts. Thus, inference of such dynamic relationships between genes with consideration of biological conditions is very challenging.


In this study, we propose a comprehensive and integrated approach to infer the dynamic relationships between genes and evaluate this approach on three distinct gene networks.


This study demonstrates the advantage of integrating Markov chain Monte Carlo (MCMC) simulation into a Bayesian mixture model to overcome the high-dimension, low sample size (HDLSS) problem as well as to identify context-specific biological modules. Such biological modules were identified through the summarization of sampled network structures obtained from MCMC simulation.


This novel approach gives a comprehensive understanding of the dynamically regulated biological modules.


Markov chain Monte Carlo Bayesian mixture model Gene network 



This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education Grant 2017R1D1A1B03032457 and Hankuk University of Foreign Studies Research Fund (to Y.K.), and the Ministry of Science and ICT of Korea Grant 2014M3C9A3063544 and the Ministry of Education of Korea Grant 2016R1D1A1B03930209 (to J.K.).

Compliance with ethical standards

Conflict of interest

Younhee Ko, Jaebum Kim, and Sandra L. Rodriguez-Zas declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human subjects or animals performed by any of the authors.


  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499Google Scholar
  2. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data (SIGMOD), pp 207–216Google Scholar
  3. Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA et al (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21:1337–1342Google Scholar
  4. Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17:1490–1504Google Scholar
  5. Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Proceedings of pacific symposium on biocomputing (PSB), vol 5, pp 415–426Google Scholar
  6. Chan TE, Stumpf MPH, Babtie AC (2017) Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst 5:251–267 e253PubMedCentralGoogle Scholar
  7. Chen G, Jensen ST, Stoeckert CJ Jr (2007) Clustering of genes into regulons using integrated modeling-COGRIM. Genome Biol 8:R4PubMedCentralGoogle Scholar
  8. Chen G, Cairelli MJ, Kilicoglu H, Shin D, Rindflesch TC (2014) Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference. PLoS Comput Biol 10:e1003666PubMedCentralGoogle Scholar
  9. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868Google Scholar
  10. Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620Google Scholar
  11. Grzegorczyk M, Husmeier D (2008) Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach Learn 71:265–305Google Scholar
  12. Guo S, Jiang Q, Chen L, Guo D (2016) Gene regulatory network inference using PLS-based methods. BMC Bioinform 17:545Google Scholar
  13. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109Google Scholar
  14. Husmeier D, Werhli AV (2007) Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. Comput Syst Bioinform Conf 6:85–95Google Scholar
  15. Husmeier D, Dybowski R, Roberts S (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, New YorkGoogle Scholar
  16. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS One 5:e12776PubMedCentralGoogle Scholar
  17. Imoto S, Tamada Y, Araki H, Yasuda K, Print CG, Charnock-Jones SD, Sanders D, Savoie CJ, Tashiro K, Kuhara S et al (2006) Computational strategy for discovering druggable gene networks from genome-wide RNA expression profiles. Pac Symp Biocomput:559–571Google Scholar
  18. Ishida T, Schatz GC (1998) Monte Carlo sampling methods for determining potential energy surfaces using Shepard interpolation. The O(D-1) + H-2 system. Chem Phys Lett 298:285–292Google Scholar
  19. Ko Y, Zhai C, Rodriguez-Zas S (2007) Inference of gene pathways using Gaussian mixture models. In: Proceedings of 2007 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 362–367Google Scholar
  20. Ko Y, Zhai C, Rodriguez-Zas S (2009) Inference of gene pathways using mixture Bayesian networks. BMC Syst Biol 3:54PubMedCentralGoogle Scholar
  21. Ko Y, Zhai C, Rodriguez-Zas SL (2010) Discovery of gene network variability across samples representing multiple classes. Int J Bioinform Res Appl 6:402–417PubMedCentralGoogle Scholar
  22. Kuffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R (2012) Inferring gene regulatory networks by ANOVA. Bioinformatics 28:1376–1382Google Scholar
  23. Lemmens K, De Bie T, Dhollander T, De Keersmaecker SC, Thijs IM, Schoofs G, De Weerdt A, De Moor B, Vanderleyden J, Collado-Vides J et al (2009) DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol 10:R27PubMedCentralGoogle Scholar
  24. Liu F, Zhang SW, Guo WF, Wei ZG, Chen L (2016) Inference of gene regulatory network based on local Bayesian networks. PLoS Comput Biol 12:e1005024PubMedCentralGoogle Scholar
  25. Madigan D, York J (1995) Bayesian graphical models for discrete-data. Int Stat Rev 63:215–232Google Scholar
  26. Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA, Kellis M (2012) Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res 22:1334–1349PubMedCentralGoogle Scholar
  27. Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092Google Scholar
  28. Mordelet F, Vert JP (2008) SIRENE: supervised inference of regulatory networks. Bioinformatics 24:i76–i82Google Scholar
  29. Nariai N, Kim S, Imoto S, Miyano S (2004) Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks. Pac Symp Biocomput:336–347Google Scholar
  30. Nir Friedman DK (2003) Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50:95–125Google Scholar
  31. Paolo Giudici RC (2003) Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50:127–158Google Scholar
  32. Qiu J, Noble WS (2008) Predicting co-complexed protein pairs from heterogeneous data. PLoS Comput Biol 4:e1000054PubMedCentralGoogle Scholar
  33. Reiss DJ, Baliga NS, Bonneau R (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform 7:280Google Scholar
  34. Riffle M, Malmstrom L, Davis TN (2005) The yeast resource center public data repository. Nucleic Acids Res 33:D378–D382Google Scholar
  35. Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S (2003) Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics 19(Suppl 2):ii227-236Google Scholar
  36. Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6:Article15Google Scholar
  37. Werhli AV, Husmeier D (2008) Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. J Bioinform Comput Biol 6:543–572Google Scholar
  38. Yeung KY, Medvedovic M, Bumgarner RE (2003) Clustering gene-expression data with repeated measurements. Genome Biol 4:R34PubMedCentralGoogle Scholar
  39. Zitnik M, Zupan B (2015) Gene network inference by fusing data from diverse distributions. Bioinformatics 31:i230–i239PubMedCentralGoogle Scholar

Copyright information

© The Genetics Society of Korea 2019

Authors and Affiliations

  1. 1.Division of Biomedical EngineeringHankuk University of Foreign StudiesGyeonggi-doSouth Korea
  2. 2.Department of Animal SciencesUniversity of Illinois at Urbana-ChampaignChampaignUSA
  3. 3.Department of StatisticsUniversity of Illinois at Urbana-ChampaignChampaignUSA
  4. 4.Department of Biomedical Science and EngineeringKonkuk UniversitySeoulSouth Korea

Personalised recommendations