Bioinformatics pp 137-163 | Cite as

Regulatory Motif Analysis

  • Alan Moses
  • Saurabh Sinha


The first complete genome sequences of eukaryotes revealed that much of the genetic material did not code for protein sequences (Lander et al. 2001; Venter et al. 2001). Although this noncoding DNA was once thought to be “junk” DNA, it is now appreciated that large portions of it are actively conserved over evolution (Waterston et al. 2002; Johnston and Stormo 2003), suggesting that these regions contain important functional elements.

A first hypothesis about the function of this noncoding DNA is that it is involved in the regulation of gene activity. One of the best-understood mechanisms of gene regulation is the modulation of transcriptional initiation by sequence specific DNA binding proteins (or transcription factors). These proteins recognize short sequences in noncoding DNA that fall into families or contain consensus patterns or motifs.


Hide Markov Model Transcription Factor Binding Site Motif Finding Motif Model Background Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ et al (2008) Text-mining assisted regulatory annotation. Genome Biol 9(2):R31CrossRefPubMedGoogle Scholar
  2. Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2:28–36PubMedGoogle Scholar
  3. Bailey TL, Gribskov M (1998) Methods and statistics for combining motif match scores. J Comput Biol 5(2):211–221CrossRefPubMedGoogle Scholar
  4. Barash Y, Bejerano G, Friedman N (2001) A simple hyper-geometric approach for discovering putative transcription factor binding sites. Proceedings of the first international workshop on algorithms in bioinformatics, SpringerGoogle Scholar
  5. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc B 57(1):289–300Google Scholar
  6. Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol 193(4):723–750CrossRefPubMedGoogle Scholar
  7. Bergman CM, Carlson JW, Celniker SE (2005) Drosophila DNase I footprint database: A systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 21(8):1747–1749CrossRefPubMedGoogle Scholar
  8. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M et al (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci USA 99(2):757–762CrossRefPubMedGoogle Scholar
  9. Blanchette M, Tompa M (2003) FootPrinter: A program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842CrossRefPubMedGoogle Scholar
  10. Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci USA 97(18):10096–10100CrossRefPubMedGoogle Scholar
  11. Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27(2):167–171CrossRefPubMedGoogle Scholar
  12. Chiang DY, Brown PO, Eisen MB (2001) Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics 17(Suppl 1):S49–S55PubMedGoogle Scholar
  13. Down TA, Hubbard TJ (2005) NestedMICA: Sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 33(5):1445–1453CrossRefPubMedGoogle Scholar
  14. Dubchak I, Ryaboy DV (2006) VISTA family of computational tools for comparative analysis of DNA sequences and whole genomes. Methods Mol Biol 338:69–89PubMedGoogle Scholar
  15. Durbin R, Eddy SR, Krogh A, Mitchison GJ (1998) Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UKGoogle Scholar
  16. Eden E, Lipson D, Yogev S, Yakhini Z (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput Biol 3(3):e39CrossRefPubMedGoogle Scholar
  17. Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1):S354–S363PubMedGoogle Scholar
  18. Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17(6):368–376CrossRefPubMedGoogle Scholar
  19. Frith MC, Li MC, Weng Z (2003) Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 31(13):3666–3668CrossRefPubMedGoogle Scholar
  20. Gadiraju S, Vyhlidal CA, Leeder JS, Rogan PK (2003) Genome-wide prediction, display and refinement of binding sites with information theory-based models. BMC Bioinformatics 4:38CrossRefPubMedGoogle Scholar
  21. Gallo SM, Li L, Hu Z, Halfon MS (2006) REDfly: A regulatory element database for Drosophila. Bioinformatics 22(3):381–383CrossRefPubMedGoogle Scholar
  22. Halfon MS, Grad Y, Church GM, Michelson AM (2002) Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res 12(7):1019–1028PubMedGoogle Scholar
  23. Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV et al (1998) Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 26(1):362–367CrossRefPubMedGoogle Scholar
  24. Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577CrossRefPubMedGoogle Scholar
  25. Johnston M, Stormo GD (2003) Evolution. Heirlooms in the attic. Science 302(5647):997–999CrossRefPubMedGoogle Scholar
  26. Kechris KJ, van Zwet E, Bickel PJ, Eisen MB (2004) Detecting DNA regulatory motifs by incorporating positional trends in information content. Genome Biol 5(7):R50CrossRefPubMedGoogle Scholar
  27. Kellis M, Patterson N, Birren B, Berger B, Lander ES (2004) Methods in comparative genomics: Genome correspondence, gene identification and regulatory motif discovery. J Comput Biol 11(2–3):319–355CrossRefPubMedGoogle Scholar
  28. Kullback S, Leible RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
  29. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921CrossRefPubMedGoogle Scholar
  30. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214CrossRefPubMedGoogle Scholar
  31. Lawrence CE, Reilly AA (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7(1):41–51CrossRefPubMedGoogle Scholar
  32. Levine M, Davidson EH (2005) Gene regulatory networks for development. Proc Natl Acad Sci USA 102(14):4936–4942CrossRefPubMedGoogle Scholar
  33. Lifanov AP, Makeev VJ, Nazina AG, Papatsenko DA (2003) Homotypic regulatory clusters in Drosophila. Genome Res 13(4):579–588CrossRefPubMedGoogle Scholar
  34. Liu JS, Neuwald AF, Lawrence CE (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J Am Stat Assoc 90(432):1156–1170CrossRefGoogle Scholar
  35. Mannervik M, Nibu Y, Zhang H, Levine M (1999) Transcriptional coregulators in development. Science 284(5414):606–609CrossRefPubMedGoogle Scholar
  36. Markstein M, Levine M (2002) Decoding cis-regulatory DNAs in the Drosophila genome. Curr Opin Genet Dev 12(5):601–606CrossRefPubMedGoogle Scholar
  37. Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED et al (2006) ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22(5):637–640CrossRefPubMedGoogle Scholar
  38. Moses AM, Chiang DY, Eisen MB (2004a) Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac Symp Biocomput:324–335Google Scholar
  39. Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB (2004b) MONKEY: Identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5(12):R98CrossRefPubMedGoogle Scholar
  40. Münch R, Hiller K, Barg H, Heldt D, Linz S, Wingender E et al (2003) PRODORIC: Prokaryotic database of gene regulation. Nucleic Acids Res 31(1):266–269CrossRefPubMedGoogle Scholar
  41. Ovcharenko I, Boffelli D, Loots GG (2004) eShadow: A tool for comparing closely related sequences. Genome Res 14(6):1191–1198CrossRefPubMedGoogle Scholar
  42. Pavesi G, Mereghetti P, Mauri G, Pesole G (2004) Weeder Web: Discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res 32(Web Server issue):W199–W203CrossRefPubMedGoogle Scholar
  43. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004a) JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32(Database issue):D91–D94CrossRefPubMedGoogle Scholar
  44. Sandelin A, Wasserman WW, Lenhard B (2004b) ConSite: Web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32(Web Server issue):W249–W252CrossRefPubMedGoogle Scholar
  45. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986) Information content of binding sites on nucleotide sequences. J Mol Biol 188(3):415–431CrossRefPubMedGoogle Scholar
  46. Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451(7178):535–540CrossRefPubMedGoogle Scholar
  47. Segal E, Yelensky R, Koller D (2003) Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19(Suppl 1):i273–i282CrossRefPubMedGoogle Scholar
  48. Siddharthan R, Siggia ED, van Nimwegen E (2005) PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 1(7):e67CrossRefPubMedGoogle Scholar
  49. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050CrossRefPubMedGoogle Scholar
  50. Sinha S, Blanchette M, Tompa M (2004) PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5:170CrossRefPubMedGoogle Scholar
  51. Sinha S, Liang Y, Siggia E (2006) Stubb: A program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res 34(Web Server issue):W555–W559CrossRefPubMedGoogle Scholar
  52. Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol 8:344–354PubMedGoogle Scholar
  53. Smith AD, Sumazin P, Zhang MQ (2005) Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc Natl Acad Sci USA 102(5):1560–1565CrossRefPubMedGoogle Scholar
  54. Staden R (1989) Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci 5(2):89–96PubMedGoogle Scholar
  55. Stormo GD (2000) DNA binding sites: Representation and discovery. Bioinformatics 16(1):16–23CrossRefPubMedGoogle Scholar
  56. Stormo GD, Hartzell GW III (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 86(4):1183–1187CrossRefPubMedGoogle Scholar
  57. Tompa M (1999) An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. Proc Int Conf Intell Syst Mol Biol:262–271Google Scholar
  58. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144CrossRefPubMedGoogle Scholar
  59. van Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842CrossRefPubMedGoogle Scholar
  60. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al (2001) The sequence of the human genome. Science 291(5507):1304–1351CrossRefPubMedGoogle Scholar
  61. Wasserman WW, Fickett JW (1998) Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278(1):167–181CrossRefPubMedGoogle Scholar
  62. Wasserman WW, Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5(4):276–287CrossRefPubMedGoogle Scholar
  63. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915):520–562CrossRefPubMedGoogle Scholar
  64. Wingender E, Dietze P, Karas H, Knuppel R (1996) TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res 24(1):238–241CrossRefPubMedGoogle Scholar
  65. Zhu J, Zhang MQ (1999) SCPD: A promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15(7–8):607–611CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Cell & Systems BiologyUniversity of TorontoTorontoCanada
  2. 2.Dept. of Computer SciencesUniversity of IllinoisUrbanaUSA

Personalised recommendations