Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative Approaches

Sinoquet, Christine

doi:10.1007/978-94-007-6803-1_8

Christine Sinoquet³

3197 Accesses

Abstract

Systems biology may be defined as a discipline aiming at integrating various sources of heterogeneous data, with the objective to describe and predict the function of biological systems. The purpose is to cross many (possibly weak) evidences from several data types that describe different biological features of genes or proteins. Probabilistic graphical models offer an appealing framework for this objective. Through the thorough review of five selected examples, this chapter highlights how probabilistic graphical models can contribute to build the bridge between biology and computational modeling. In this methodological framework, the five cases illustrate three features of these models, which we discuss: flexibility, scalability and ability to combine heterogeneous sources of data. The applications covered address genetic association studies, identification of protein–protein interactions, identification of the target genes of transcription factors, inference of causal phenotype networks and protein function prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Depending on the context, the conditional probability of $D_1$ given $D_2$, ${\mathbb {P}} (D_1 \mid D_2)$, is also called the posterior probability of $D_1$ conditional on $D_2$.
2.
${\mathbb {P}} (M \mid D)\ {\mathbb {P}} (D) = {\mathbb {P}} (D \mid M)\ {\mathbb {P}} (M)$.
3.
Gene fusion is likely to detect a PPI since two proteins interacting in the genome of one species are more likely to be fused into one single protein in the genome of another species.
4.
A mixture model is a probabilistic model that represents a population of $k$ groups, with random proportions $\pi _1,\ \ldots ,\ \pi _k$.
5.

$$\begin{aligned} {\mathbb {P}} _{(1)}(\varphi _1,\ \varphi _2,\ \varphi _3) = {\mathbb {P}} (\varphi _1)\ {\mathbb {P}} (\varphi _2 \mid \varphi _1)\ {\mathbb {P}} (\varphi _3 \mid \varphi _2)\ \text{ and }\ {\mathbb {P}} _{(2)}(\varphi _1,\ \varphi _2,\ \varphi _3) = {\mathbb {P}} (\varphi _2)\ {\mathbb {P}} (\varphi _1 \mid \varphi _2)\ {\mathbb {P}} (\varphi _3 \mid \varphi _2). \end{aligned}$$
Equality is assessed from the Bayes theorem.
6.
If $X = y + E$, with $E \sim \fancyscript{N}(0,\sigma ^2)$, then $X \sim \fancyscript{N}(y,\sigma ^2)$.

Abbreviations

BN:: Bayesian network
ChIP-chip:: Chromatin immunoprecipitation on chip
ChIP-seq:: Chromatin immunoprecipitation followed by sequencing
CPN:: Causal phenotype network
DDI:: Domain-domain interaction
DNA:: Deoxyribonucleic acid
GA:: Genetic architecture
GO:: Gene ontology
GOS:: GO sub-ontology
GWAS:: Genome wide association study
MCMC:: Monte Carlo Markov chain
MRF:: Markov random field
MRF-MJM:: MRF mixture joint model
PGM:: Probabilistic graphical model
PPI:: Protein–protein interaction
QTL:: Quantitative trait loci
RNA:: Ribonucleic acid
RNAi:: RNA interference
ROC curve:: Receiver operating characteristic curve
SMM:: Standard mixture model
TF:: Transcription factor

References

Besag J (1986) On the statistical analysis of dirty pictures. J Roy Statist Soc Ser B 48:259–302
Google Scholar
Carroll S, Pavlovic V (2006) Protein classification using probabilistic chain graphs and the Gene Ontology structure. Bioinformatics 22(15):1871–1878
Article CAS PubMed Google Scholar
Chaibub Neto E, Ferrara CT, Attie AD, Yandell BS (2008) Inferring causal phenotype networks from segregating populations. Genetics 179(2):1089–1100. doi:10.1534/genetics.107.085167
Article PubMed Central PubMed Google Scholar
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Stat 4(1):320–339
Article Google Scholar
Chen M, Cho J, Zhao H (2011) Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLOS Genet 7(4):e1001353. doi:10.1371/journal.pgen.1001353
Article CAS PubMed Central PubMed Google Scholar
Deng M, Chen T, Sun F (2003) An integrated probabilistic model for functional prediction of proteins. In: Proceedings of the seventh annual international conference on research in computational molecular biology (RECOMb), pp 95–103
Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868
Article CAS PubMed Central PubMed Google Scholar
Elnitski L, Jin VX, Farnham PJ, Jones SJ (2006) Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 16(12):1455–1464
Article CAS PubMed Google Scholar
Enright AJ, Iliopoulos I, Kyripides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90
Article CAS PubMed Google Scholar
Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Gen 78(6):1011–1025
Article CAS Google Scholar
Gama-Castro S, Jimánez-Jacinto V, Peralta-Gil M et al (2008) RegulonDB (version 6.0): Gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 36:D120–D124. doi:10.1093/nar/gkm994
Article CAS PubMed Central PubMed Google Scholar
Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147
Article CAS PubMed Google Scholar
Hutz JE, Kraja AT, McLeod HL, Province MA (2008) CANDID: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32(8):779–790
Article PubMed Google Scholar
Karaoz U, Murali T, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S (2004) Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 101:2888–2893
Article CAS PubMed Central PubMed Google Scholar
Kindermann R, Snell JL (1980) Markov random fields and their applications. American Mathematical Society
Google Scholar
Köhler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82:949–958
Article PubMed Central PubMed Google Scholar
Ladunga I (2010) An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 674:1–22
Article CAS PubMed Google Scholar
Lauritzen SL (1996) Graphical models. Oxford University Press, New York
Google Scholar
Letovsky S, Kasif S (2003) Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19:i197–i204
Article PubMed Google Scholar
Li H, Wei Z, Maris J (2010) A hidden Markov random field model for genome-wide association studies. Biostatistics 11:139–150
Article PubMed Central PubMed Google Scholar
Marcotte EM (2000) Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 10(3):359–365
Article CAS PubMed Google Scholar
Mering CV, Jensen LJ, Snel B et al (2005) String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 33:433–437
Article Google Scholar
Mitrofanova A, Pavlovic V, Mishra B (2011) Prediction of protein functions with Gene Ontology and interspecies protein homology data. EEE/ACM Trans Comput Biol Bioinf 8(3):775–784
Article Google Scholar
Mourad R, Sinoquet C, Leray P (2011) A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinform 12:16+
Google Scholar
Mourad R, Sinoquet C, Dina C, Leray P (2011) Visualization of pairwise and multilocus linkage disequilibrium structure using latent forests. PLOS ONE 6(12):e27320
Article CAS PubMed Central PubMed Google Scholar
Nariai N, Kolaczyk ED, Kasif S (2007) Probabilistic protein function prediction from heterogeneous genome-wide data. PLOS ONE 2(3):e337
Article PubMed Central PubMed Google Scholar
Ng SK, Zhang Z, Tan SH, Lin K (2003) InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 31(1):251–254
Article CAS PubMed Central PubMed Google Scholar
Nguyen TT, Androulakis IP (2009) Recent advances in the computational discovery of transcription factor binding sites. Algorithms 2(1):582–605. doi:10.3390/a2010582
Article CAS Google Scholar
Oshchepkov DY, Levitsky VG (2011) In silico prediction of transcriptional factor-binding sites. In: Series. Methods in molecular biology, vol 760, pp 251–267. doi:10.1007/978-94-007-6803-1_16
Pan W, Wei P, Khodursky A (2008) A parametric joint model of DNA-protein binding, gene expression and DNA sequence data to detect target genes of a transcription factor. Pacific Symp Biocomput 13:465–476
Google Scholar
Peng G, Luo L, Siu H, Zhu Y et al (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18:111–117
Article PubMed Central PubMed Google Scholar
Peri S, Navarro JD, Amanchy R et al (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13(10):2363–2371
Article CAS PubMed Central PubMed Google Scholar
Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein-protein interaction network. Nature Biotechnol 23:951–959. doi:10.1038/nbt1103
Article CAS Google Scholar
Schadt EE, Lamb J, Yang X et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37(7):710–717
Article CAS PubMed Central PubMed Google Scholar
Sinoquet C, Mourad R, Leray P (2012) Forests of latent tree models for the detection of genetic associations. In: International conference on bioinformatics models, methods and algorithms (Bioinformatics), 5–14
Google Scholar
The Gene Ontology Consortium, Ashburner M, Ball CA, Blake JA et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29. doi:10.1038/75556
Google Scholar
Verzilli CJ, Stallard N, Whittaker JC (2006) Bayesian graphical models for genome-wide association studies. Am J Hum Genet 79:100–112
Article CAS PubMed Central PubMed Google Scholar
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403
Article Google Scholar
Wang W, Cherry JM, Nochomovitz Y, Jolly E, Botstein D, Li H (2005) Inference of combinatorial regulation in Yeast transcriptional networks: a case study of sporulation. Proc Natl Acad Sci USA 102:1998–2003
Article CAS PubMed Central PubMed Google Scholar
Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide associations studies. Am J Hum Genet 81:1278–1283
Article CAS PubMed Central PubMed Google Scholar
Wei P, Pan W (2012) Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann Appl Stat 6(1):334–355
Article PubMed Central PubMed Google Scholar
Wu X, Jiang R, Zhang MQ, Li S (2008) Network-based global inference of human disease genes. Mol Syst Biol 4:189
Article PubMed Central PubMed Google Scholar
Xia K, Dong D, Han J-DJ (2006) IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model. BMC Bioinform 7:508. doi:10.1186/1471-2105-7-508
Article Google Scholar
Xia JF, Wang S-L, Lei Y-K (2010) Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 17(9):1069–1078
Article CAS PubMed Google Scholar
Yosef N, Sharan R, Stafford Noble W (2008) Improved network-based identification of protein orthologs. Bioinformatics 24(16):i200–i206
Article PubMed Google Scholar
Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M (2004) Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 14(6):1107–1118
Article CAS PubMed Central PubMed Google Scholar
Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of Yeast regulatory networks. Nat Genet 40(7):854–861. doi:10.1038/ng.167
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgments

The author wishes to thank the anonymous reviewer for constructive comments on her manuscript, and feedback most helpful to produce the final version.

Author information

Authors and Affiliations

LINA, UMR CNRS 6241, Université de Nantes, 2 rue de la Houssinière, BP 92208, 44322, Nantes Cedex, France
Christine Sinoquet

Authors

Christine Sinoquet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christine Sinoquet .

Editor information

Editors and Affiliations

Chemical and Biological Engineering, Vanderbilt University, Nashville, TN, USA
Aleš Prokop
Research Group on Process Network Engineering, Kaposvár University, Kaposvár, Hungary
Béla Csukás

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sinoquet, C. (2013). Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative Approaches. In: Prokop, A., Csukás, B. (eds) Systems Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6803-1_8

Download citation

DOI: https://doi.org/10.1007/978-94-007-6803-1_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6802-4
Online ISBN: 978-94-007-6803-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics