Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning

Jing, Fang; Zhang, Shao-Wu; Cao, Zhen; Zhang, Shihua

doi:10.1007/978-3-319-94968-0_23

Fang Jing²⁴,
Shao-Wu Zhang²⁴,
Zhen Cao²⁵ &
…
Shihua Zhang^25,26

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10847))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

1367 Accesses
3 Citations

Abstract

Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mitchell, P.J., Tjian, R.: Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989)
Article Google Scholar
Junion, G., Spivakov, M., Girardot, C., Braun, M., Gustafson, E.H., Birney, E., Furlong, E.E.: A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell 148, 473–486 (2012)
Article Google Scholar
Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)
Article Google Scholar
Lee, T.I., Young, R.A.: Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013)
Article Google Scholar
Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K.: An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012)
Article Google Scholar
Gilfillan, G.D., Hughes, T., Sheng, Y., Hjorthaug, H.S., Straub, T., Gervin, K., Harris, J.R., Undlien, D.E., Lyle, R.: Limitations and possibilities of low cell number ChIP-seq. BMC Genom. 13, 645 (2012)
Article Google Scholar
Park, P.J.: ChIP–seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009)
Article Google Scholar
Warner, J.B., Philippakis, A.A., Jaeger, S.A., He, F.S., Lin, J., Bulyk, M.L.: Systematic identification of mammalian regulatory motifs’ target genes and functions. Nat. Methods 5, 347–353 (2008)
Article Google Scholar
Ghandi, M., Lee, D., Mohammad-Noori, M., Beer, M.A.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711 (2014)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017)
Article Google Scholar
Qin, Q., Feng, J.: Imputation for transcription factor binding predictions based on deep learning. PLoS Comput. Biol. 13, e1005403 (2017)
Article Google Scholar
Yang, B., Liu, F., Ren, C., Ouyang, Z., Xie, Z., Bo, X., Shu, W.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33, 1930–1936 (2017)
Article Google Scholar
Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26(7), 990–999 (2016)
Article Google Scholar
Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32, i121–i127 (2016)
Article Google Scholar
Jurtz, V.I., Johansen, A.R., Nielsen, M., Almagro Armenteros, J.J., Nielsen, H., Sønderby, C.K., Winther, O., Sønderby, S.K.: An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 33, 3685–3690 (2017)
Article Google Scholar
Liu, Q., Xia, F., Yin, Q., Jiang, R.: Chromatin accessibility prediction via a hybrid deep convolutional neural network. Bioinformatics 34(5), 732–738 (2017). https://doi.org/10.1093/bioinformatics/btx679
Article Google Scholar
Min, X., Zeng, W., Chen, N., Chen, T., Jiang, R.: Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics 33, i92–i101 (2017)
Article Google Scholar
Bu, H., Gan, Y., Wang, Y., Zhou, S., Guan, J.: A new method for enhancer prediction based on deep belief network. BMC Bioinform. 18, 418 (2017)
Article Google Scholar
Zhang, J., Peng, W., Wang, L.: LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks. Bioinformatics 34(10), 1705–1712 (2018). https://doi.org/10.1093/bioinformatics/bty003
Article Google Scholar
Piqueregi, R., Degner, J.F., Pai, A.A., Gaffney, D.J., Gilad, Y., Pritchard, J.K.: Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011)
Article Google Scholar
Xin, B., Rohs, R.: Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. (2018). https://doi.org/10.1101/gr.220079.116
Min, X., Zeng, W., Chen, S., Chen, N., Chen, T., Jiang, R.: Predicting enhancers with deep convolutional neural networks. BMC Bioinform. 18, 478 (2017)
Article Google Scholar
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015)
Article Google Scholar
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M.: TensorFlow: a system for large-scale machine learning. In: OSDI 2016, pp. 265–283 (2016)
Google Scholar
Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Kheradpour, P., Zhang, Z., Heravi-Moussavi, A., Liu, Y., Amin, V.: Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)
Article Google Scholar
Ziller, M.J., Edri, R., Yaffe, Y., Donaghey, J., Pop, R., Mallard, W., Issner, R., Gifford, C.A., Goren, A., Xing, J.: Dissecting neural differentiation regulatory networks through epigenetic footprinting. Nature 518, 355–359 (2015)
Article Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgement

Fang Jing would like to thank the support of the National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, CAS, during his visit. The work was supported by the National Natural Science Foundation of China [No. 61473232 and 91430111 to SWZ; No. 61621003 and 11661141019 to SZ]; the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) [No. XDB13040600], the Key Research Program of the Chinese Academy of Sciences, [No. KFZD-SW-219] and CAS Frontier Science Research Key Project for Top Young Scientist [No. QYZDB-SSW-SYS008].

Author information

Authors and Affiliations

Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi’an, 710072, China
Fang Jing & Shao-Wu Zhang
NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Zhen Cao & Shihua Zhang
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Shihua Zhang

Authors

Fang Jing
View author publications
You can also search for this author in PubMed Google Scholar
Shao-Wu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Cao
View author publications
You can also search for this author in PubMed Google Scholar
Shihua Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shao-Wu Zhang or Shihua Zhang .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Fa Zhang
Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Georgia State University, Atlanta, GA, USA
Pavel Skums
Chinese Academy of Sciences, Beijing, China
Shihua Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jing, F., Zhang, SW., Cao, Z., Zhang, S. (2018). Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-94968-0_23
Published: 13 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94967-3
Online ISBN: 978-3-319-94968-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics