A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery

Gordân, Raluca; Narlikar, Leelavati; Hartemink, Alexander J.

doi:10.1007/978-3-540-78839-3_9

A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery

Raluca Gordân¹,
Leelavati Narlikar¹ &
Alexander J. Hartemink¹

Conference paper

1387 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4955))

Abstract

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for incorporating conservation information into TF motif discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It does not require sequence alignments, nor the phylogenetic relationships between the orthologous sequences, and yet it is more effective on real biological data than methods that do.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kellis, M., et al.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)
Article Google Scholar
Cliften, P., et al.: Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003)
Article Google Scholar
Clark, A., et al.: Proposal for Drosophila as a model system for comparative genomics (2003), http://flybase.net/.data/docs/CommunityWhitePapers/GenomesWP2003.html
Blanchette, M., Tompa, M.: FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research 31, 3840–3842 (2003)
Article Google Scholar
Newberg, L.A., et al.: A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)
Article Google Scholar
Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)
Article Google Scholar
MacIsaac, K.D., et al.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)
Article Google Scholar
Wang, T., Stormo, G.D.: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380 (2003)
Article Google Scholar
Sinha, S., Blanchette, M., Tompa, M.: PhyME: A probabilistic algorithm for Finding Motifs in Sets of Orthologous Sequences. BMC Bioinformatics 5, 170 (2004)
Article Google Scholar
Siddharthan, R., Siggia, E.D., van Nimwegen, E.: PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comp. Biol. 1, e67 (2005)
Article Google Scholar
Prakash, A., Blanchette, M., Sinha, S., Tompa, M.: Motif discovery in heterogeneous sequence data. In: PSB 2004, pp. 348–359 (2004)
Google Scholar
Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. In: PSB 2004, pp. 324–335 (2004)
Google Scholar
Liu, Y., et al.: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Research 14, 451–458 (2004)
Article Google Scholar
Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)
Article Google Scholar
Chin, C., Chuang, J.H., Li, H.: Genome-wide regulatory complexity in yeast promoters: Separation of functionally conserved and neutral sequence. Genome Res. 15, 205–213 (2005)
Article Google Scholar
Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics & Development 15, 214–221 (2005)
Article Google Scholar
Morgenstern, B.: A space-efficient algorithm for aligning large genomic sequences. Bioinformatics 16, 1531–1539 (2000)
Article Google Scholar
Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 392, e384–e392 (2006)
Article Google Scholar
Narlikar, L., Gordân, R., Hartemink, A.: Nucleosome Occupancy Information Improves de novo Motif Discovery. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 107–121. Springer, Heidelberg (2007)
Chapter Google Scholar
Narlikar, L., Gordân, R., Hartemink, A.: A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast. PLoS Computational Biology 3, e215 (2007)
Article Google Scholar
Gordân, R., Hartemink, A.: Using DNA duplex stability information to discover transcription factor binding sites. In: PSB 2008, vol. 13, pp. 453–464 (2008)
Google Scholar
Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB 1994, pp. 28–36 (1994)
Google Scholar
Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)
Article MathSciNet MATH Google Scholar
Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21, 3777–3784 (1993)
Article Google Scholar
Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)
Google Scholar
Li, X., Wong, W.H.: Sampling motifs on phylogenetic trees. PNAS 102, 9481–9486 (2005)
Article MathSciNet Google Scholar
Brudno, M., et al.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Article Google Scholar
Siddharthan, R.: Sigma: multiple alignment of weakly-conserved non-coding DNA sequence. BMC Bioinformatics 7, 143 (2006)
Article Google Scholar
Elemento, O., Tavazoie, S.: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biology 6, R18 (2005)
Article Google Scholar
Cliften, P.F., et al.: Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Duke University, Durham, NC 27708-0129,
Raluca Gordân, Leelavati Narlikar & Alexander J. Hartemink

Authors

Raluca Gordân
View author publications
You can also search for this author in PubMed Google Scholar
Leelavati Narlikar
View author publications
You can also search for this author in PubMed Google Scholar
Alexander J. Hartemink
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Martin Vingron Limsoon Wong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gordân, R., Narlikar, L., Hartemink, A.J. (2008). A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-78839-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78838-6
Online ISBN: 978-3-540-78839-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics