Statistical Learning and Modeling of TF-DNA Binding

Jiang, Bo; Liu, Jun S.

doi:10.1007/978-3-642-16345-6_3

Bo Jiang⁴ &
Jun S. Liu⁴

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4085 Accesses

Abstract

Discovering binding sites and motifs of specific TFs is an important first step towards the understanding of gene regulation circuitry. Computational approaches have been developed to identify transcription factor binding sites from a set of co-regulated genes. Recently, the abundance of gene expression data, ChIP-based TF-binding data (ChIP-array/seq), and high-resolution epigenetic maps have brought up the possibility of capturing sequence features relevant to TF-DNA interactions so as to improve the predictive power of gene regulation modeling. In this chapter, we introduce some statistical models and computational strategies used to predict TF-DNA interactions from the DNA sequence information, and describe a general framework of predictive modeling approaches to the TF-DNA binding problem, which includes both traditional regression methods and statistical learning methods by selecting relevant sequence features and epigenetic markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California: AAAI Press.
Google Scholar
Berger, M. F., Philippakis, A. A., Qureshi, A., He, F. S., Estep, P. W., & Bulyk, M. L. (2006). Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotechnology, 24(11), 1429–1435
Article Google Scholar
Bussemaker, H. J., Li, H., & Siggia, E. D. (2001). Regulatory element detection using correlation with expression. Nature Genetics, 27, 167–174.
Article Google Scholar
Chipman, H. A., George, E. I., & McCulloch, R. E. (2007). Bayesian ensemble learning. In B. Scholkopf, J. Platt, & T. Hoffman (Eds.), Neural information processing systems, 19. Cambridge, MA: MIT Press.
Google Scholar
Conlon, E. M., Liu, X. S., Lieb, J. D., & Liu, J. S. (2001). Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Science United States of America, 100, 3339–3344.
Article Google Scholar
Djordjevic, M., Sengupta, A. M., & Shraiman, B. I. (2003). A biophysical approach to transcription factor binding site discovery. Genome Research, 13, 2381–2390.
Article Google Scholar
Foat, B. C., Houshmandi, S. S., Olivas, W. M., & Bussemaker, H. J. (2005). Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proceedings of the National Academy of Science United States of America, 102, 17675–17680.
Article Google Scholar
Freund, Y., & Schapire, R. (1997). A decision-theoretical generalization of online learning and an application to boosting. Journal of Computer and System Science, 55, 119–139.
Article MathSciNet MATH Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19, 1–67.
Article MathSciNet MATH Google Scholar
Gupta, M., & Liu, J. S. (2005). De-novo cis-regulatory module elicitation for eukaryotic genomes. Proceedings of the National Academy of Science United States of America, 102, 7079–7084.
Article Google Scholar
Hertz, G. Z., Hartzell, G. W., & Stormo, G. D. (1990). Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics, 6, 81–92.
Article Google Scholar
Hong, P., Liu, X. S., Zhou, Q., Lu, X., Liu, J. S., & Wong, W. H. (2005). A boosting approach for motif modeling using ChIP-chip data. Bioinformatics, 21, 2636–2643.
Article Google Scholar
Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004) Computational discovery of gene regulatory binding motifs: A bayesian perspective. Statistical Science, 19, 188–204.
Article MathSciNet MATH Google Scholar
Kinney, J. B., Tkacik, G., & Callan, C. G., Jr. (2007). Precise physical models of protein-DNA interaction from high-throughput data. Proceedings of the National Academy of Science United States of America, 104, 501–506.
Article Google Scholar
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262, 208–214.
Article Google Scholar
Lee, W., Tillo, D., Bray, N., Morse, R. H., Davis, R. W., Hughes, T. R., et al. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics, 39, 1235–1244.
Article Google Scholar
Liang, F., & Wong, W. H. (2002). Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Statistica Sinica, 10, 317–342.
Google Scholar
Liu, J.S., & Lawrence, C.E. (1999). Bayesian inference on biopolymer models. Bioinformatics, 15, 38–52.
Article Google Scholar
Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.
Article MATH Google Scholar
Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839.
Google Scholar
McCue, L. A., Thompson, W., Carmack, C. S., Ryan, M. P., Liu, J. S., Derbyshire, V., & Lawrence, C. E. (2001). Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research, 29, 774–782.
Article Google Scholar
Narlikar, L., Gordân, R., & Hartemink, A. J. (2007). A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Computational Biology, 3(11), e215
Article Google Scholar
Sinha, S., & Tompa, M. (2002). Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30, 5549–5560.
Article Google Scholar
Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., & Lawrence, C. E. (2004). Decoding human regulatory circuits. Genome Research, 10, 1967–1974.
Article Google Scholar
Vapnik, V. (1998). The nature of statistical learning theory (2nd ed.). New York: Springer.
Google Scholar
Won, K. J., Ren, B., & Wang, W. (2010). Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biology, 11, R7.
Article Google Scholar
Yuan, G. C., Liu, Y. J., Dion, D. F., Slack, M. D., Wu, L. F., Altschuler, S. J., et al. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science, 309, 626–630.
Google Scholar
Yuan, G. C., Ma, P., Zhong, W., & Liu, J. S. (2006). Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae. Genome Biology, 7, R70.
Article Google Scholar
Zhong, W., Zeng, P., Ma, P., Liu, J. S., & Zhu, Y. (2005). RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics, 21, 4169–4175.
Article Google Scholar
Zhou, Q., & Liu, J. S. (2004). Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics, 20, 909–916.
Article Google Scholar
Zhou, Q., & Liu, J. S. (2008). Extracting sequence features to predict protein-DNA interactions: A comparative study. Nucleic Acids Research, 36, 4137–4148.
Article Google Scholar
Zhou, Q., & Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Science United States of America, 101, 12114–12119.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistic, Harvard University, Cambridge, MA, 02138, USA
Bo Jiang & Jun S. Liu

Authors

Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jun S. Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Jiang .

Editor information

Editors and Affiliations

, Institute of Statistics, National Chiao Tung University, Ta Hsueh Road 1001, Hsinchu, 30050, Taiwan, Taiwan R.O.C.
Henry Horng-Shing Lu
, Department of Empirical Inference, MPI for Intelligent Systems, Spemannstraße 38, Tübingen, 72076, Germany
Bernhard Schölkopf
School of Medicine, Dept. Epidemiology & Public Health, Yale University, College Street 60, New Haven, 06520, Connecticut, USA
Hongyu Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jiang, B., Liu, J.S. (2011). Statistical Learning and Modeling of TF-DNA Binding. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-16345-6_3
Published: 09 April 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics