Skip to main content

ENSEMBLE-CNN: Predicting DNA Binding Sites in Protein Sequences by an Ensemble Deep Learning Method

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10955))

Included in the following conference series:

Abstract

Detection of DNA binding sites in proteins plays an essential role in gene regulation processing. However, the difficult problem in developing machine learning predictors of DNA binding sites in protein is that: the number of DNA binding sites is significantly fewer than that of non-binding sites. Aiming to handle this issue, we propose a new predictor, named ENSEMBLE-CNN, which integrates instance selection and bootstrapping techniques for predicting imbalanced DNA-binding sites from protein primary sequences. ENSEMBLE-CNN uses a protein’s evolutionary information and sequence feature as two basic features and employs sampling strategy to deal with the class imbalance problem. Multiple initial predictors with CNNs as classifiers are trained by applying SMOTE and a random under-sampling technique to the original negative dataset. The final ensemble predictor is obtained by majority voting strategy. The results demonstrate that the proposed ENSEMBLE-CNN achieves high prediction accuracy and outperforms the existing sequence-based protein-DNA binding sites predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Si, J., Zhao, R., Wu, R.: An overview of the prediction of protein DNA-binding sites. Int. J. Mol. Sci. 16(3), 5194–5215 (2015)

    Article  Google Scholar 

  2. Wong, K.C., Li, Y., Peng, C., Wong, H.S.: A comparison study for DNA motif modeling on protein binding microarray. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(2), 261–271 (2016)

    Article  Google Scholar 

  3. Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep, P.W., Bulyk, M.L.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24(11), 1429–1435 (2006)

    Article  Google Scholar 

  4. Valouev, A., Johnson, D.S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R.M., Sidow, A.: Genomewide analysis of transcription factor binding sites based on chip-seq data. Nat. Methods 5(9), 829–834 (2008)

    Article  Google Scholar 

  5. Ho, S.W., Jona, G., Chen, C.T., Johnston, M., Snyder, M.: Linking DNA-binding proteins to their recognition sequences by using protein microarrays. Proc. Nat. Acad. Sci. U.S.A. 103(26), 9940–9945 (2006)

    Article  Google Scholar 

  6. Wang, L., Brown, S.J.: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34(Web Server issue), W243 (2006)

    Article  Google Scholar 

  7. Wang, L., Huang, C., Yang, M.Q., Yang, J.Y.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4(S1), S3 (2010)

    Article  Google Scholar 

  8. Chu, W.Y., Huang, Y.F., Huang, C.C., Cheng, Y.S., Huang, C.K., Oyang, Y.J.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37(Web Server issue), W396 (2009)

    Article  Google Scholar 

  9. Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)

    Article  Google Scholar 

  10. Si, J., Zhang, Z., Lin, B., Schroeder, M., Huang, B.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(S1), S7 (2011)

    Article  Google Scholar 

  11. Li, B.Q., Feng, K.Y., Ding, J., Cai, Y.D.: Predicting DNA-binding sites of proteins based on sequential and 3D structural information. Mol. Genet. Genomics 289(3), 489–499 (2014)

    Article  Google Scholar 

  12. Hu, J., Li, Y., Zhang, M., Yang, X., Shen, H.B., Yu, D.J.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinform. PP(99), 1389–1398 (2016)

    Google Scholar 

  13. Hu, J., Li, Y., Yan, W.X., Yang, J.Y., Shen, H.B., Yu, D.J.: KNN-based dynamic query-driven sample rescaling strategy for class imbalance learning. Neurocomputing 191, 363–373 (2016)

    Article  Google Scholar 

  14. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2011)

    MATH  Google Scholar 

  15. Ahmad, S., Gromiha, M.M., Sarai, A.: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20(4), 477–486 (2004)

    Article  Google Scholar 

  16. Wong, K.C., Li, Y., Peng, C., Moses, A.M., Zhang, Z.: Computational learning on specificity-determining residue-nucleotide interactions. Nucleic Acids Res. 43(21), 10180–10189 (2015)

    Google Scholar 

  17. Schffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., Altschul, S.F.: Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29(14), 2994–3005 (2001)

    Article  Google Scholar 

  18. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28(1), 45–48 (2000)

    Article  Google Scholar 

  19. Huang, D.-S.: Radial basis probabilistic neural networks: model and application. Int. J. Pattern Recogn. Artif. Intell. 13(07), 1083–1101 (1999)

    Article  Google Scholar 

  20. Huang, D.S., Du, J.X.: A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans. Neural Netw. 19(12), 2099–2115 (2008)

    Article  Google Scholar 

  21. Zhang, J.-R., Zhang, J., Lok, T.-M., Lyu, M.R.: A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 185(2), 1026–1037 (2007)

    MATH  Google Scholar 

  22. Huang, D.-S.: A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans. Neural Netw. 15(2), 477–491 (2004)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants (No. 61702058, 61772091), the China Postdoctoral Science Foundation funded project (No. 2017M612948), the Scientific Research Foundation for Advanced Talents of Chengdu University of Information Technology under Grant (No. KYTZ201717, KYTZ201715, KYTZ201750), the Scientific Research Foundation for Young Academic Leaders of Chengdu University of Information Technology under Grant (No. J201701, J201706), the Planning Foundation for Humanities and Social Sciences of Ministry of Education of China under Grant (No. 15YJAZH058), and the Innovative Research Team Construction Plan in Universities of Sichuan Province under Grant (No. 18TD0027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaojie Qiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Qiao, S., Ji, S., Zhou, J. (2018). ENSEMBLE-CNN: Predicting DNA Binding Sites in Protein Sequences by an Ensemble Deep Learning Method. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95933-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95932-0

  • Online ISBN: 978-3-319-95933-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics