Skip to main content

Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative Approaches

  • Chapter
Systems Biology
  • 3197 Accesses

Abstract

Systems biology may be defined as a discipline aiming at integrating various sources of heterogeneous data, with the objective to describe and predict the function of biological systems. The purpose is to cross many (possibly weak) evidences from several data types that describe different biological features of genes or proteins. Probabilistic graphical models offer an appealing framework for this objective. Through the thorough review of five selected examples, this chapter highlights how probabilistic graphical models can contribute to build the bridge between biology and computational modeling. In this methodological framework, the five cases illustrate three features of these models, which we discuss: flexibility, scalability and ability to combine heterogeneous sources of data. The applications covered address genetic association studies, identification of protein–protein interactions, identification of the target genes of transcription factors, inference of causal phenotype networks and protein function prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Depending on the context, the conditional probability of \(D_1\) given \(D_2\), \({\mathbb {P}} (D_1 \mid D_2)\), is also called the posterior probability of \(D_1\) conditional on \(D_2\).

  2. 2.

    \({\mathbb {P}} (M \mid D)\ {\mathbb {P}} (D) = {\mathbb {P}} (D \mid M)\ {\mathbb {P}} (M)\).

  3. 3.

    Gene fusion is likely to detect a PPI since two proteins interacting in the genome of one species are more likely to be fused into one single protein in the genome of another species.

  4. 4.

    A mixture model is a probabilistic model that represents a population of \(k\) groups, with random proportions \(\pi _1,\ \ldots ,\ \pi _k\).

  5. 5.
    $$\begin{aligned} {\mathbb {P}} _{(1)}(\varphi _1,\ \varphi _2,\ \varphi _3) = {\mathbb {P}} (\varphi _1)\ {\mathbb {P}} (\varphi _2 \mid \varphi _1)\ {\mathbb {P}} (\varphi _3 \mid \varphi _2)\ \text{ and }\ {\mathbb {P}} _{(2)}(\varphi _1,\ \varphi _2,\ \varphi _3) = {\mathbb {P}} (\varphi _2)\ {\mathbb {P}} (\varphi _1 \mid \varphi _2)\ {\mathbb {P}} (\varphi _3 \mid \varphi _2). \end{aligned}$$

    Equality is assessed from the Bayes theorem.

  6. 6.

    If \(X = y + E\), with \(E \sim \fancyscript{N}(0,\sigma ^2)\), then \(X \sim \fancyscript{N}(y,\sigma ^2)\).

Abbreviations

BN:

Bayesian network

ChIP-chip:

Chromatin immunoprecipitation on chip

ChIP-seq:

Chromatin immunoprecipitation followed by sequencing

CPN:

Causal phenotype network

DDI:

Domain-domain interaction

DNA:

Deoxyribonucleic acid

GA:

Genetic architecture

GO:

Gene ontology

GOS:

GO sub-ontology

GWAS:

Genome wide association study

MCMC:

Monte Carlo Markov chain

MRF:

Markov random field

MRF-MJM:

MRF mixture joint model

PGM:

Probabilistic graphical model

PPI:

Protein–protein interaction

QTL:

Quantitative trait loci

RNA:

Ribonucleic acid

RNAi:

RNA interference

ROC curve:

Receiver operating characteristic curve

SMM:

Standard mixture model

TF:

Transcription factor

References

  1. Besag J (1986) On the statistical analysis of dirty pictures. J Roy Statist Soc Ser B 48:259–302

    Google Scholar 

  2. Carroll S, Pavlovic V (2006) Protein classification using probabilistic chain graphs and the Gene Ontology structure. Bioinformatics 22(15):1871–1878

    Article  CAS  PubMed  Google Scholar 

  3. Chaibub Neto E, Ferrara CT, Attie AD, Yandell BS (2008) Inferring causal phenotype networks from segregating populations. Genetics 179(2):1089–1100. doi:10.1534/genetics.107.085167

    Article  PubMed Central  PubMed  Google Scholar 

  4. Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Stat 4(1):320–339

    Article  Google Scholar 

  5. Chen M, Cho J, Zhao H (2011) Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLOS Genet 7(4):e1001353. doi:10.1371/journal.pgen.1001353

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  6. Deng M, Chen T, Sun F (2003) An integrated probabilistic model for functional prediction of proteins. In: Proceedings of the seventh annual international conference on research in computational molecular biology (RECOMb), pp 95–103

    Google Scholar 

  7. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  8. Elnitski L, Jin VX, Farnham PJ, Jones SJ (2006) Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 16(12):1455–1464

    Article  CAS  PubMed  Google Scholar 

  9. Enright AJ, Iliopoulos I, Kyripides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90

    Article  CAS  PubMed  Google Scholar 

  10. Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Gen 78(6):1011–1025

    Article  CAS  Google Scholar 

  11. Gama-Castro S, Jimánez-Jacinto V, Peralta-Gil M et al (2008) RegulonDB (version 6.0): Gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 36:D120–D124. doi:10.1093/nar/gkm994

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  12. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147

    Article  CAS  PubMed  Google Scholar 

  13. Hutz JE, Kraja AT, McLeod HL, Province MA (2008) CANDID: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32(8):779–790

    Article  PubMed  Google Scholar 

  14. Karaoz U, Murali T, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S (2004) Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 101:2888–2893

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  15. Kindermann R, Snell JL (1980) Markov random fields and their applications. American Mathematical Society

    Google Scholar 

  16. Köhler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82:949–958

    Article  PubMed Central  PubMed  Google Scholar 

  17. Ladunga I (2010) An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 674:1–22

    Article  CAS  PubMed  Google Scholar 

  18. Lauritzen SL (1996) Graphical models. Oxford University Press, New York

    Google Scholar 

  19. Letovsky S, Kasif S (2003) Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19:i197–i204

    Article  PubMed  Google Scholar 

  20. Li H, Wei Z, Maris J (2010) A hidden Markov random field model for genome-wide association studies. Biostatistics 11:139–150

    Article  PubMed Central  PubMed  Google Scholar 

  21. Marcotte EM (2000) Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 10(3):359–365

    Article  CAS  PubMed  Google Scholar 

  22. Mering CV, Jensen LJ, Snel B et al (2005) String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 33:433–437

    Article  Google Scholar 

  23. Mitrofanova A, Pavlovic V, Mishra B (2011) Prediction of protein functions with Gene Ontology and interspecies protein homology data. EEE/ACM Trans Comput Biol Bioinf 8(3):775–784

    Article  Google Scholar 

  24. Mourad R, Sinoquet C, Leray P (2011) A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinform 12:16+

    Google Scholar 

  25. Mourad R, Sinoquet C, Dina C, Leray P (2011) Visualization of pairwise and multilocus linkage disequilibrium structure using latent forests. PLOS ONE 6(12):e27320

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Nariai N, Kolaczyk ED, Kasif S (2007) Probabilistic protein function prediction from heterogeneous genome-wide data. PLOS ONE 2(3):e337

    Article  PubMed Central  PubMed  Google Scholar 

  27. Ng SK, Zhang Z, Tan SH, Lin K (2003) InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 31(1):251–254

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  28. Nguyen TT, Androulakis IP (2009) Recent advances in the computational discovery of transcription factor binding sites. Algorithms 2(1):582–605. doi:10.3390/a2010582

    Article  CAS  Google Scholar 

  29. Oshchepkov DY, Levitsky VG (2011) In silico prediction of transcriptional factor-binding sites. In: Series. Methods in molecular biology, vol 760, pp 251–267. doi:10.1007/978-94-007-6803-1_16

  30. Pan W, Wei P, Khodursky A (2008) A parametric joint model of DNA-protein binding, gene expression and DNA sequence data to detect target genes of a transcription factor. Pacific Symp Biocomput 13:465–476

    Google Scholar 

  31. Peng G, Luo L, Siu H, Zhu Y et al (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18:111–117

    Article  PubMed Central  PubMed  Google Scholar 

  32. Peri S, Navarro JD, Amanchy R et al (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13(10):2363–2371

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  33. Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein-protein interaction network. Nature Biotechnol 23:951–959. doi:10.1038/nbt1103

    Article  CAS  Google Scholar 

  34. Schadt EE, Lamb J, Yang X et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37(7):710–717

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  35. Sinoquet C, Mourad R, Leray P (2012) Forests of latent tree models for the detection of genetic associations. In: International conference on bioinformatics models, methods and algorithms (Bioinformatics), 5–14

    Google Scholar 

  36. The Gene Ontology Consortium, Ashburner M, Ball CA, Blake JA et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29. doi:10.1038/75556

    Google Scholar 

  37. Verzilli CJ, Stallard N, Whittaker JC (2006) Bayesian graphical models for genome-wide association studies. Am J Hum Genet 79:100–112

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  38. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403

    Article  Google Scholar 

  39. Wang W, Cherry JM, Nochomovitz Y, Jolly E, Botstein D, Li H (2005) Inference of combinatorial regulation in Yeast transcriptional networks: a case study of sporulation. Proc Natl Acad Sci USA 102:1998–2003

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  40. Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide associations studies. Am J Hum Genet 81:1278–1283

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  41. Wei P, Pan W (2012) Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann Appl Stat 6(1):334–355

    Article  PubMed Central  PubMed  Google Scholar 

  42. Wu X, Jiang R, Zhang MQ, Li S (2008) Network-based global inference of human disease genes. Mol Syst Biol 4:189

    Article  PubMed Central  PubMed  Google Scholar 

  43. Xia K, Dong D, Han J-DJ (2006) IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model. BMC Bioinform 7:508. doi:10.1186/1471-2105-7-508

    Article  Google Scholar 

  44. Xia JF, Wang S-L, Lei Y-K (2010) Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 17(9):1069–1078

    Article  CAS  PubMed  Google Scholar 

  45. Yosef N, Sharan R, Stafford Noble W (2008) Improved network-based identification of protein orthologs. Bioinformatics 24(16):i200–i206

    Article  PubMed  Google Scholar 

  46. Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M (2004) Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 14(6):1107–1118

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  47. Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of Yeast regulatory networks. Nat Genet 40(7):854–861. doi:10.1038/ng.167

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

The author wishes to thank the anonymous reviewer for constructive comments on her manuscript, and feedback most helpful to produce the final version.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine Sinoquet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Sinoquet, C. (2013). Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative Approaches. In: Prokop, A., Csukás, B. (eds) Systems Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6803-1_8

Download citation

Publish with us

Policies and ethics