Skip to main content

A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4955))

Abstract

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for incorporating conservation information into TF motif discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It does not require sequence alignments, nor the phylogenetic relationships between the orthologous sequences, and yet it is more effective on real biological data than methods that do.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kellis, M., et al.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)

    Article  Google Scholar 

  2. Cliften, P., et al.: Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003)

    Article  Google Scholar 

  3. Clark, A., et al.: Proposal for Drosophila as a model system for comparative genomics (2003), http://flybase.net/.data/docs/CommunityWhitePapers/GenomesWP2003.html

  4. Blanchette, M., Tompa, M.: FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research 31, 3840–3842 (2003)

    Article  Google Scholar 

  5. Newberg, L.A., et al.: A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)

    Article  Google Scholar 

  6. Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)

    Article  Google Scholar 

  7. MacIsaac, K.D., et al.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)

    Article  Google Scholar 

  8. Wang, T., Stormo, G.D.: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380 (2003)

    Article  Google Scholar 

  9. Sinha, S., Blanchette, M., Tompa, M.: PhyME: A probabilistic algorithm for Finding Motifs in Sets of Orthologous Sequences. BMC Bioinformatics 5, 170 (2004)

    Article  Google Scholar 

  10. Siddharthan, R., Siggia, E.D., van Nimwegen, E.: PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comp. Biol. 1, e67 (2005)

    Article  Google Scholar 

  11. Prakash, A., Blanchette, M., Sinha, S., Tompa, M.: Motif discovery in heterogeneous sequence data. In: PSB 2004, pp. 348–359 (2004)

    Google Scholar 

  12. Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. In: PSB 2004, pp. 324–335 (2004)

    Google Scholar 

  13. Liu, Y., et al.: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Research 14, 451–458 (2004)

    Article  Google Scholar 

  14. Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

    Article  Google Scholar 

  15. Chin, C., Chuang, J.H., Li, H.: Genome-wide regulatory complexity in yeast promoters: Separation of functionally conserved and neutral sequence. Genome Res. 15, 205–213 (2005)

    Article  Google Scholar 

  16. Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics & Development 15, 214–221 (2005)

    Article  Google Scholar 

  17. Morgenstern, B.: A space-efficient algorithm for aligning large genomic sequences. Bioinformatics 16, 1531–1539 (2000)

    Article  Google Scholar 

  18. Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 392, e384–e392 (2006)

    Article  Google Scholar 

  19. Narlikar, L., Gordân, R., Hartemink, A.: Nucleosome Occupancy Information Improves de novo Motif Discovery. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 107–121. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Narlikar, L., Gordân, R., Hartemink, A.: A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast. PLoS Computational Biology 3, e215 (2007)

    Article  Google Scholar 

  21. Gordân, R., Hartemink, A.: Using DNA duplex stability information to discover transcription factor binding sites. In: PSB 2008, vol. 13, pp. 453–464 (2008)

    Google Scholar 

  22. Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB 1994, pp. 28–36 (1994)

    Google Scholar 

  23. Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  24. Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21, 3777–3784 (1993)

    Article  Google Scholar 

  25. Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)

    Google Scholar 

  26. Li, X., Wong, W.H.: Sampling motifs on phylogenetic trees. PNAS 102, 9481–9486 (2005)

    Article  MathSciNet  Google Scholar 

  27. Brudno, M., et al.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)

    Article  Google Scholar 

  28. Siddharthan, R.: Sigma: multiple alignment of weakly-conserved non-coding DNA sequence. BMC Bioinformatics 7, 143 (2006)

    Article  Google Scholar 

  29. Elemento, O., Tavazoie, S.: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biology 6, R18 (2005)

    Article  Google Scholar 

  30. Cliften, P.F., et al.: Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Vingron Limsoon Wong

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gordân, R., Narlikar, L., Hartemink, A.J. (2008). A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78839-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78838-6

  • Online ISBN: 978-3-540-78839-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics