Gibbs sampler

  • Xuhua Xia


Gibbs sampler is for de novo motif discovery. Suppose we have a set of sequences each containing a regulatory motif located in different locations of the sequences, but we do not know what the motif looks like or where it is located within each sequence. Gibbs sampler will find such a motif if it is well represented in these sequences. If we have a set of yeast intron sequences each containing a branchpoint site (BPS) somewhere, but we do not know what BPS looks like or where it is located along the intron sequence, Gibbs sampler will find these BPSs. Another scenario involves the discovery of protein binding sites (e.g., transcription factor binding site) given a set of sequences from ChIP-Seq. Each of these sequences has a short sequence segment with affinity to a protein, but we do not know what the short sequence segment looks like or where it is located within the sequence. Gibbs sampler shines in discovering such protein-binding sites. This chapter breaks the black box of Gibbs sampler and numerically illustrates each of its computational steps, including the site sampler (which assumes that each input sequence harbors a signal motif) and motif sampler (which is used when some sequences may contain multiple signal motifs and some none).


  1. Aerts S, Van Loo P, Thijs G, Mayer H, de Martin R, Moreau Y, De Moor B (2005) TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res 33(Web Server):W393–W396CrossRefPubMedPubMedCentralGoogle Scholar
  2. Aird WC, Parvin JD, Sharp PA, Rosenberg RD (1994) The interaction of GATA-binding proteins and basal transcription factors with GATA box-containing core promoters. A model of tissue-specific gene expression. J Biol Chem 269(2):883–889PubMedGoogle Scholar
  3. Anderson KP, Crable SC, Lingrel JB (1998) Multiple proteins binding to a GATA-E box-GATA motif regulate the erythroid Kruppel-like factor (EKLF) gene. J Biol Chem 273(23):14347–14354CrossRefPubMedGoogle Scholar
  4. Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bucklew JA (1990) Large deviation techniques in decision, simulation, and estimation. Wiley, New YorkGoogle Scholar
  6. Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B (2003) INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res 31(13):3468–3470CrossRefPubMedPubMedCentralGoogle Scholar
  7. Evans T, Felsenfeld G, Reitman M (1990) Control of globin gene transcription. Annu Rev Cell Biol 6:95–124CrossRefPubMedGoogle Scholar
  8. Fong TC, Emerson BM (1992) The erythroid-specific protein cGATA-1 mediates distal enhancer activity through a specialized beta-globin TATA box. Genes Dev 6(4):521–532CrossRefPubMedGoogle Scholar
  9. Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741CrossRefPubMedGoogle Scholar
  10. Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577CrossRefPubMedGoogle Scholar
  11. Holmes I, Bruno WJ (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9):803–820CrossRefPubMedGoogle Scholar
  12. Jensen JL, Hein J (2005) Gibbs sampler for statistical multiple alignment. Stat Sin 15:889–907Google Scholar
  13. Kullback S (1959) Information theory and statistics. Wiley, New YorkGoogle Scholar
  14. Kullback S (1987) The Kullback-Leibler distance. Am Stat 41:340–341Google Scholar
  15. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86CrossRefGoogle Scholar
  16. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214CrossRefPubMedGoogle Scholar
  17. Lowry JA, Atchley WR (2000) Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J Mol Evol 50(2):103–115CrossRefPubMedGoogle Scholar
  18. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458(7234):97–101CrossRefPubMedPubMedCentralGoogle Scholar
  19. Mannella CA, Neuwald AF, Lawrence CE (1996) Detection of likely transmembrane beta strand regions in sequences of mitochondrial pore proteins using the Gibbs sampler. J Bioenerg Biomembr 28(2):163–169CrossRefPubMedGoogle Scholar
  20. Metropolis N (1987) The beginnning of the Monte Carlo method. Los Alamos Sci 15(Special issue):125–130Google Scholar
  21. Moi P, Loudianos G, Lavinha J, Murru S, Cossu P, Casu R, Oggiano L, Longinotti M, Cao A, Pirastu M (1992) Delta-thalassemia due to a mutation in an erythroid-specific binding protein sequence 3′ to the delta-globin gene. Blood 79(2):512–516PubMedGoogle Scholar
  22. Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632CrossRefPubMedPubMedCentralGoogle Scholar
  23. Nishimura S, Takahashi S, Kuroha T, Suwabe N, Nagasawa T, Trainor C, Yamamoto M (2000) A GATA box in the GATA-1 gene hematopoietic enhancer is a critical element in the network of GATA factors and sites that regulate this gene. Mol Cell Biol 20(2):713–723CrossRefPubMedPubMedCentralGoogle Scholar
  24. Orkin SH (1990) Globin gene regulation and switching: circa 1990. Cell 63(4):665–672CrossRefPubMedPubMedCentralGoogle Scholar
  25. Orkin SH (1992) GATA-binding transcription factors in hematopoietic cells. Blood 80(3):575–581PubMedGoogle Scholar
  26. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, Laxman B, Asangani IA, Grasso CS, Kominsky HD et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29(8):742–749CrossRefPubMedPubMedCentralGoogle Scholar
  27. Qin ZS, McCue LA, Thompson W, Mayerhofer L, Lawrence CE, Liu JS (2003) Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 21(4):435–439CrossRefPubMedGoogle Scholar
  28. Qu K, McCue LA, Lawrence CE (1998) Bayesian protein family classifier. Proc Int Conf Intell Syst Mol Biol 6:131–139PubMedGoogle Scholar
  29. Rouchka EC (1997) A brief overview of Gibbs Sampling. IBC Statistics Study Group, Washington University, Institute for Biomedical ComputingGoogle Scholar
  30. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512CrossRefGoogle Scholar
  31. Samso M, Palumbo MJ, Radermacher M, Liu JS, Lawrence CE (2002) A Bayesian method for classification of images from electron micrographs. J Struct Biol 138(3):157–170CrossRefPubMedGoogle Scholar
  32. Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431PubMedPubMedCentralCrossRefGoogle Scholar
  33. Schena M (2003) Microarray analysis. Wiley-Liss, New YorkGoogle Scholar
  34. Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122CrossRefPubMedGoogle Scholar
  35. Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y (2002a) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 9(2):447–464CrossRefPubMedGoogle Scholar
  36. Thijs G, Moreau Y, De Smet F, Mathys J, Lescot M, Rombauts S, Rouze P, De Moor B, Marchal K (2002b) INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling. Bioinformatics 18(2):331–332CrossRefPubMedGoogle Scholar
  37. Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585CrossRefPubMedPubMedCentralGoogle Scholar
  38. Thompson W, Palumbo MJ, Wasserman WW, Liu JS, Lawrence CE (2004) Decoding human regulatory circuits. Genome Res 14(10A):1967–1974CrossRefPubMedPubMedCentralGoogle Scholar
  39. Van Esch H, Devriendt K (2001) Transcription factor GATA3 and the human HDR syndrome. Cell Mol Life Sci 58(9):1296–1300CrossRefPubMedGoogle Scholar
  40. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487PubMedPubMedCentralCrossRefGoogle Scholar
  41. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefPubMedPubMedCentralGoogle Scholar
  42. Xia X (2007b) Bioinformatics and the cell: modern computational approaches in genomics, proteomics and transcriptomics. Springer US, New YorkCrossRefGoogle Scholar
  43. Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728PubMedPubMedCentralCrossRefGoogle Scholar
  44. Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43CrossRefGoogle Scholar
  45. Xia X, MacKay V, Yao X, Wu J, Miura F, Ito T, Morris DR (2011) Translation initiation: a regulatory role for poly(A) tracts in front of the AUG codon in saccharomyces cerevisiae. Genetics 189(2):469–478CrossRefPubMedPubMedCentralGoogle Scholar
  46. Zhu J, Liu JS, Lawrence CE (1998) Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1):25–39CrossRefPubMedGoogle Scholar
  47. Zon LI, Gurish MF, Stevens RL, Mather C, Reynolds DS, Austen KF, Orkin SH (1991) GATA-binding transcription factors in mast cells regulate the promoter of the mast cell carboxypeptidase A gene. J Biol Chem 266(34):22948–22953PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Xuhua Xia
    • 1
  1. 1.University of Ottawa CAREG and Biology DepartmentOttawaCanada

Personalised recommendations