Skip to main content

Statistical Learning and Modeling of TF-DNA Binding

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4085 Accesses

Abstract

Discovering binding sites and motifs of specific TFs is an important first step towards the understanding of gene regulation circuitry. Computational approaches have been developed to identify transcription factor binding sites from a set of co-regulated genes. Recently, the abundance of gene expression data, ChIP-based TF-binding data (ChIP-array/seq), and high-resolution epigenetic maps have brought up the possibility of capturing sequence features relevant to TF-DNA interactions so as to improve the predictive power of gene regulation modeling. In this chapter, we introduce some statistical models and computational strategies used to predict TF-DNA interactions from the DNA sequence information, and describe a general framework of predictive modeling approaches to the TF-DNA binding problem, which includes both traditional regression methods and statistical learning methods by selecting relevant sequence features and epigenetic markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California: AAAI Press.

    Google Scholar 

  2. Berger, M. F., Philippakis, A. A., Qureshi, A., He, F. S., Estep, P. W., & Bulyk, M. L. (2006). Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotechnology, 24(11), 1429–1435

    Article  Google Scholar 

  3. Bussemaker, H. J., Li, H., & Siggia, E. D. (2001). Regulatory element detection using correlation with expression. Nature Genetics, 27, 167–174.

    Article  Google Scholar 

  4. Chipman, H. A., George, E. I., & McCulloch, R. E. (2007). Bayesian ensemble learning. In B. Scholkopf, J. Platt, & T. Hoffman (Eds.), Neural information processing systems, 19. Cambridge, MA: MIT Press.

    Google Scholar 

  5. Conlon, E. M., Liu, X. S., Lieb, J. D., & Liu, J. S. (2001). Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Science United States of America, 100, 3339–3344.

    Article  Google Scholar 

  6. Djordjevic, M., Sengupta, A. M., & Shraiman, B. I. (2003). A biophysical approach to transcription factor binding site discovery. Genome Research, 13, 2381–2390.

    Article  Google Scholar 

  7. Foat, B. C., Houshmandi, S. S., Olivas, W. M., & Bussemaker, H. J. (2005). Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proceedings of the National Academy of Science United States of America, 102, 17675–17680.

    Article  Google Scholar 

  8. Freund, Y., & Schapire, R. (1997). A decision-theoretical generalization of online learning and an application to boosting. Journal of Computer and System Science, 55, 119–139.

    Article  MathSciNet  MATH  Google Scholar 

  9. Friedman, J. H. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–67.

    Article  MathSciNet  MATH  Google Scholar 

  10. Gupta, M., & Liu, J. S. (2005). De-novo cis-regulatory module elicitation for eukaryotic genomes. Proceedings of the National Academy of Science United States of America, 102, 7079–7084.

    Article  Google Scholar 

  11. Hertz, G. Z., Hartzell, G. W., & Stormo, G. D. (1990). Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics, 6, 81–92.

    Article  Google Scholar 

  12. Hong, P., Liu, X. S., Zhou, Q., Lu, X., Liu, J. S., & Wong, W. H. (2005). A boosting approach for motif modeling using ChIP-chip data. Bioinformatics, 21, 2636–2643.

    Article  Google Scholar 

  13. Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004) Computational discovery of gene regulatory binding motifs: A bayesian perspective. Statistical Science, 19, 188–204.

    Article  MathSciNet  MATH  Google Scholar 

  14. Kinney, J. B., Tkacik, G., & Callan, C. G., Jr. (2007). Precise physical models of protein-DNA interaction from high-throughput data. Proceedings of the National Academy of Science United States of America, 104, 501–506.

    Article  Google Scholar 

  15. Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262, 208–214.

    Article  Google Scholar 

  16. Lee, W., Tillo, D., Bray, N., Morse, R. H., Davis, R. W., Hughes, T. R., et al. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics, 39, 1235–1244.

    Article  Google Scholar 

  17. Liang, F., & Wong, W. H. (2002). Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Statistica Sinica, 10, 317–342.

    Google Scholar 

  18. Liu, J.S., & Lawrence, C.E. (1999). Bayesian inference on biopolymer models. Bioinformatics, 15, 38–52.

    Article  Google Scholar 

  19. Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.

    Article  MATH  Google Scholar 

  20. Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839.

    Google Scholar 

  21. McCue, L. A., Thompson, W., Carmack, C. S., Ryan, M. P., Liu, J. S., Derbyshire, V., & Lawrence, C. E. (2001). Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research, 29, 774–782.

    Article  Google Scholar 

  22. Narlikar, L., Gordân, R., & Hartemink, A. J. (2007). A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Computational Biology, 3(11), e215

    Article  Google Scholar 

  23. Sinha, S., & Tompa, M. (2002). Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30, 5549–5560.

    Article  Google Scholar 

  24. Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., & Lawrence, C. E. (2004). Decoding human regulatory circuits. Genome Research, 10, 1967–1974.

    Article  Google Scholar 

  25. Vapnik, V. (1998). The nature of statistical learning theory (2nd ed.). New York: Springer.

    Google Scholar 

  26. Won, K. J., Ren, B., & Wang, W. (2010). Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biology, 11, R7.

    Article  Google Scholar 

  27. Yuan, G. C., Liu, Y. J., Dion, D. F., Slack, M. D., Wu, L. F., Altschuler, S. J., et al. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science, 309, 626–630.

    Google Scholar 

  28. Yuan, G. C., Ma, P., Zhong, W., & Liu, J. S. (2006). Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae. Genome Biology, 7, R70.

    Article  Google Scholar 

  29. Zhong, W., Zeng, P., Ma, P., Liu, J. S., & Zhu, Y. (2005). RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics, 21, 4169–4175.

    Article  Google Scholar 

  30. Zhou, Q., & Liu, J. S. (2004). Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics, 20, 909–916.

    Article  Google Scholar 

  31. Zhou, Q., & Liu, J. S. (2008). Extracting sequence features to predict protein-DNA interactions: A comparative study. Nucleic Acids Research, 36, 4137–4148.

    Article  Google Scholar 

  32. Zhou, Q., & Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Science United States of America, 101, 12114–12119.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jiang, B., Liu, J.S. (2011). Statistical Learning and Modeling of TF-DNA Binding. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_3

Download citation

Publish with us

Policies and ethics