A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery

  • Raluca Gordân
  • Leelavati Narlikar
  • Alexander J. Hartemink
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)


As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for incorporating conservation information into TF motif discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It does not require sequence alignments, nor the phylogenetic relationships between the orthologous sequences, and yet it is more effective on real biological data than methods that do.


Motif Discovery Related Organism Orthologous Sequence Intergenic Sequence Conservation Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kellis, M., et al.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)CrossRefGoogle Scholar
  2. 2.
    Cliften, P., et al.: Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003)CrossRefGoogle Scholar
  3. 3.
    Clark, A., et al.: Proposal for Drosophila as a model system for comparative genomics (2003),
  4. 4.
    Blanchette, M., Tompa, M.: FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research 31, 3840–3842 (2003)CrossRefGoogle Scholar
  5. 5.
    Newberg, L.A., et al.: A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)CrossRefGoogle Scholar
  6. 6.
    Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)CrossRefGoogle Scholar
  7. 7.
    MacIsaac, K.D., et al.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)CrossRefGoogle Scholar
  8. 8.
    Wang, T., Stormo, G.D.: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380 (2003)CrossRefGoogle Scholar
  9. 9.
    Sinha, S., Blanchette, M., Tompa, M.: PhyME: A probabilistic algorithm for Finding Motifs in Sets of Orthologous Sequences. BMC Bioinformatics 5, 170 (2004)CrossRefGoogle Scholar
  10. 10.
    Siddharthan, R., Siggia, E.D., van Nimwegen, E.: PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comp. Biol. 1, e67 (2005)CrossRefGoogle Scholar
  11. 11.
    Prakash, A., Blanchette, M., Sinha, S., Tompa, M.: Motif discovery in heterogeneous sequence data. In: PSB 2004, pp. 348–359 (2004)Google Scholar
  12. 12.
    Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. In: PSB 2004, pp. 324–335 (2004)Google Scholar
  13. 13.
    Liu, Y., et al.: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Research 14, 451–458 (2004)CrossRefGoogle Scholar
  14. 14.
    Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)CrossRefGoogle Scholar
  15. 15.
    Chin, C., Chuang, J.H., Li, H.: Genome-wide regulatory complexity in yeast promoters: Separation of functionally conserved and neutral sequence. Genome Res. 15, 205–213 (2005)CrossRefGoogle Scholar
  16. 16.
    Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics & Development 15, 214–221 (2005)CrossRefGoogle Scholar
  17. 17.
    Morgenstern, B.: A space-efficient algorithm for aligning large genomic sequences. Bioinformatics 16, 1531–1539 (2000)CrossRefGoogle Scholar
  18. 18.
    Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 392, e384–e392 (2006)CrossRefGoogle Scholar
  19. 19.
    Narlikar, L., Gordân, R., Hartemink, A.: Nucleosome Occupancy Information Improves de novo Motif Discovery. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 107–121. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Narlikar, L., Gordân, R., Hartemink, A.: A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast. PLoS Computational Biology 3, e215 (2007)CrossRefGoogle Scholar
  21. 21.
    Gordân, R., Hartemink, A.: Using DNA duplex stability information to discover transcription factor binding sites. In: PSB 2008, vol. 13, pp. 453–464 (2008)Google Scholar
  22. 22.
    Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB 1994, pp. 28–36 (1994)Google Scholar
  23. 23.
    Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)CrossRefMathSciNetzbMATHGoogle Scholar
  24. 24.
    Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21, 3777–3784 (1993)CrossRefGoogle Scholar
  25. 25.
    Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)Google Scholar
  26. 26.
    Li, X., Wong, W.H.: Sampling motifs on phylogenetic trees. PNAS 102, 9481–9486 (2005)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Brudno, M., et al.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)CrossRefGoogle Scholar
  28. 28.
    Siddharthan, R.: Sigma: multiple alignment of weakly-conserved non-coding DNA sequence. BMC Bioinformatics 7, 143 (2006)CrossRefGoogle Scholar
  29. 29.
    Elemento, O., Tavazoie, S.: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biology 6, R18 (2005)CrossRefGoogle Scholar
  30. 30.
    Cliften, P.F., et al.: Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Raluca Gordân
    • 1
  • Leelavati Narlikar
    • 1
  • Alexander J. Hartemink
    • 1
  1. 1.Department of Computer ScienceDuke UniversityDurham 

Personalised recommendations