Skip to main content

The Statistical Power of Phylogenetic Motif Models

  • Conference paper
Book cover Research in Computational Molecular Biology (RECOMB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4955))

Abstract

One component of the genomic program controlling the transcriptional regulation of genes are the locations and arrangement of transcription factors bound to the promoter and enhancer regions of a gene. Because the genomic locations of the functional binding sites of most transcription factors is not yet known, predicting them is of great importance. Unfortunately, it is well known that the low specificity of the binding of transcription factors to DNA makes such prediction, using position-specific probability matrices (motifs) alone, subject to huge numbers of false positives. One approach to alleviating this problem has been to use phylogenetic “shadowing” or “footprinting” to remove unconserved regions of the genome from consideration. Another approach has been to combine a phylogenetic model and the site-specificity model into a single, predictive model of conserved binding sites. Both of these approaches are based on alignments of orthologous genomic regions from two or more species. In this work, we use a simplified, theoretical model to study the statistical power of the later approach to the prediction of features such as transcription factor binding sites. We investigate the question of the number of genomes required at varying evolutionary distances to achieve specified levels of accuracy (false positive and false negative prediction rates). We show that this depends strongly on the information content of the position-specific probability matrix and on the evolutionary model. We explore the effects of modifying the structure of the phylogenetic model, and conclude that placing the target genome at the root of the tree has a negligible effect on the power predicted by the model. Hence, as it is much easier to calculate, we can use this as an approximation to phylogenetic motif scanning using real trees. Finally we perform an empirical study and demonstrate that the performance of current phylogenetic motif scanning programs is far from the theoretical limit of their power, leaving ample room for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. GuhaThakurta, D.: Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 34(12), 3585–3598 (2006)

    Article  Google Scholar 

  2. Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)

    Article  Google Scholar 

  3. Gumucio, D.L., Heilstedt-Williamson, H., Gray, T.A., Tarlé, S.A., Shelton, D.A., Tagle, D.A., Slightom, J.L., Goodman, M., Collins, F.S.: Phylogenetic footprinting reveals a nuclear protein which binds to silencer sequences in the human gamma and epsilon globin genes. Mol. Cell Biol. 12(11), 4919–4929 (1992)

    Google Scholar 

  4. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., Rubin, E.M.: Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299(5611), 1391–1394 (2003)

    Article  Google Scholar 

  5. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol. Evol. 17(6), 368–376 (1981)

    Article  Google Scholar 

  6. Moses, A.M., Chiang, D.Y., Pollard, D.A., Iyer, V.N., Eisen, M.B.: MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol. 5(12), R98 (2004)

    Article  Google Scholar 

  7. Moses, A.M., Pollard, D.A., Nix, D.A., Iyer, V.N., Li, X.Y., Biggin, M.D., Eisen, M.B.: Large-scale turnover of functional transcription factor binding sites in drosophila. PLoS Comput. Biol. 2(10), e130 (2006)

    Article  Google Scholar 

  8. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12(6), 996–1006 (2002)

    Article  Google Scholar 

  9. Loots, G.G., Ovcharenko, I.: rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32(Web Server issue), W217–W221 (2004)

    Article  Google Scholar 

  10. Sandelin, A., Wasserman, W.W., Lenhard, B.: ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32(Web Server issue), W249–W252 (2004)

    Article  Google Scholar 

  11. Wasserman, W.W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)

    Article  Google Scholar 

  12. Eddy, S.R.: A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 3(1), e10 (2005)

    Article  Google Scholar 

  13. Siddharthan, R., Siggia, E.D., van Nimwegen, E.: PhyloGibbs: a gibbs sampling motif finder that incorporates phylogeny. PLoS Comput. Biol. 1(7), e67 (2005)

    Article  Google Scholar 

  14. Sinha, S., Blanchette, M., Tompa, M.: PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5, 170 (2004)

    Article  Google Scholar 

  15. Hasegawa, M., Kishino, H., Yano, T.: Dating of the human-ape splitting by a molecular clock of mitochondrial dna. J Mol Evol 22(2), 160–174 (1985)

    Article  Google Scholar 

  16. Halpern, A.L., Bruno, W.J.: Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol. Biol. Evol. 15(7), 910–917 (1998)

    Google Scholar 

  17. Staden, R.: Searching for patterns in protein and nucleic acid sequences. Methods Enzymol. 183, 193–211 (1990)

    Article  Google Scholar 

  18. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(Database issue), D91–D94 (2004)

    Article  Google Scholar 

  19. Zhu, J., Zhang, M.Q.: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15(7-8), 607–611 (1999)

    Article  Google Scholar 

  20. Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32(1), 189–200 (2004)

    Article  Google Scholar 

  21. Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18(20), 6097–6100 (1990)

    Article  Google Scholar 

  22. Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423(6937), 241–254 (2003)

    Article  Google Scholar 

  23. Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., Johnston, M.: Finding functional features in saccharomyces genomes by phylogenetic footprinting. Science 301(5629), 71–76 (2003)

    Article  Google Scholar 

  24. Borneman, A.R., Gianoulis, T.A., Zhang, Z.D., Yu, H., Rozowsky, J., Seringhaus, M.R., Wang, L.Y., Gerstein, M., Snyder, M.: Divergence of transcription factor binding sites across related yeast species. Science 317(5839), 815–819 (2007)

    Article  Google Scholar 

  25. Siepel, A., Haussler, D.: Combining phylogenetic and hidden markov models in biosequence analysis. J Comput Biol. 11(2-3), 413–428 (2004)

    Article  Google Scholar 

  26. Moses, A.M., Chiang, D.Y., Eisen, M.B.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. In: Pac Symp. Biocomput., pp. 324–335 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Vingron Limsoon Wong

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hawkins, J., Bailey, T.L. (2008). The Statistical Power of Phylogenetic Motif Models. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78839-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78838-6

  • Online ISBN: 978-3-540-78839-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics