Skip to main content

An Evolutionary Model of DNA Substring Distribution

  • Chapter
Algorithms and Applications

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6060))

  • 1056 Accesses

Abstract

DNA sequence analysis methods, such as motif discovery, gene detection or phylogeny reconstruction, can often provide important input for biological studies. Many of such methods require a background model, representing the expected distribution of short substrings in a given DNA region. Most current techniques for modeling this distribution disregard the evolutionary processes underlying DNA formation. We propose a novel approach for modeling DNA k-mer distribution that is capable of taking the notions of evolution and natural selection into account. We derive a computionally tractable approximation for estimating k-mer probabilities at genetic equilibrium, given a description of evolutionary processes in terms of fitness and mutation probabilities. We assess the goodness of this approximation via numerical experiments. Besides providing a generative model for DNA sequences, our method has further applications in motif discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Davidson, E.H.: The regulatory genome: gene regulatory networks in development and evolution. Academic Press, San Diego (2006)

    Google Scholar 

  2. Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)

    Article  Google Scholar 

  3. Thijs, G., Lescot, M., Marchal, K., Rombauts, S., Moor, B.D., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)

    Article  Google Scholar 

  4. Mustonen, V., Lässig, M.: Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. Proc. Natl. Acad. Sci. USA 102(44), 15936–15941 (2005)

    Article  Google Scholar 

  5. Moses, A.M., Pollard, D.A., Nix, D.A., Iyer, V.N., Li, X.Y., Biggin, M.D., Eisen, M.B.: Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput. Biol. 2(10), e130 (2006)

    Article  Google Scholar 

  6. Doniger, S.W., Fay, J.C.: Frequent gain and loss of functional transcription factor binding sites. PLoS Comput. Biol. 3(5), e99 (2007)

    Article  Google Scholar 

  7. Huang, W., Nevins, J.R., Ohler, U.: Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools. Genome. Biol. 8(10), R225 (2007)

    Article  Google Scholar 

  8. Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting gene regulatory elements in silico on a genomic scale. Genome. Res. 8(11), 1202–1215 (1998)

    Google Scholar 

  9. Das, M.K., Dai, H.K.: A survey of DNA motif finding algorithms. BMC Bioinformatics 8(Suppl. 7), S21 (2007)

    Article  Google Scholar 

  10. Redhead, E., Bailey, T.: Discriminative motif discovery in DNA and protein sequences using the DEME algorithm. BMC Bioinformatics 8(1), 385 (2007)

    Article  Google Scholar 

  11. Vilo, J.: Pattern discovery from biosequences. Thesis PhD (2002)

    Google Scholar 

  12. Wang, G., Yu, T., Zhang, W.: WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res. 33(Web Server issue), W412–W416 (2005)

    Article  Google Scholar 

  13. Cartwright, R.A.: DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21(Suppl. 3), iii31–iii38 (2005)

    MathSciNet  Google Scholar 

  14. Varadarajan, A., Bradley, R., Holmes, I.: Tools for simulating evolution of aligned genomic regions with integrated parameter estimation. Genome. Biol. 9(10), R147 (2008)

    Google Scholar 

  15. Rouchka, E.C., Hardin, C.T.: rMotifGen: random motif generator for DNA and protein sequences. BMC Bioinformatics 8, 292 (2007)

    Article  Google Scholar 

  16. Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thåström, A., Field, Y., Moore, I.K., Wang, J.P.Z., Widom, J.: A genomic code for nucleosome positioning. Nature 442(7104), 772–778 (2006)

    Article  Google Scholar 

  17. Saxonov, S., Berg, P., Brutlag, D.L.: A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad. Sci. USA 103(5), 1412–1417 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kull, M., Tretyakov, K., Vilo, J. (2010). An Evolutionary Model of DNA Substring Distribution. In: Elomaa, T., Mannila, H., Orponen, P. (eds) Algorithms and Applications. Lecture Notes in Computer Science, vol 6060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12476-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12476-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12475-4

  • Online ISBN: 978-3-642-12476-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics