Skip to main content
Log in

INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features

  • Published:
International Journal of Peptide Research and Therapeutics Aims and scope Submit manuscript

Abstract

Protein–DNA interactions carry out many important regulatory functions in our body which are essential for growth and survival. The molecular level understanding of these interactions helps us to decipher the mechanism of regulation. Conventionally these interactions were identified using small scale experimental techniques and high throughput technologies. But these approaches being time consuming and expensive, arouse the need of a computational approach for the prediction of interactions. To this end, a machine learning approach for predicting and classifying DNA-binding proteins has been used. Weka and LibSVM platform have been used to develop an effective classifying model. The different classifiers were applied on the attributes generated from protein sequence and tried to develop an efficient model for classifying a protein as DNA-interacting or DNA-non-interacting. Among several classifying algorithms applied to generate models, best performance was achieved using LibSVM with 87.83 % accuracy. The tool named INTERACT-O-FINDER, based on the prediction model is available at http://interacto.eurekanow.org/index.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341:65–71

    Article  CAS  PubMed  Google Scholar 

  • András S, Jeffrey S (2006) Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol 358:922–933

    Article  Google Scholar 

  • Andrea S, Ondřej K, Filip Ž, Jakub T (2012) Prediction of DNA-binding proteins from relational features. Proteome Sci 10:66

    Article  Google Scholar 

  • Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res 33:6486–6493

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Blouin S, Craggs TD, Lafontaine DA, Penedo JC (2009) Functional studies of DNA–protein interactions using FRET techniques. Methods Mol Biol 543:475–502

    Article  CAS  PubMed  Google Scholar 

  • Caragea C, Caragea D, Silvescu A, Honavar V (2010) Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models. BMC Bioinform 11:S6

    Article  Google Scholar 

  • Chang CC and Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

  • Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor Newsl 6:1–6

    Article  Google Scholar 

  • Chu WY, Huang YF, Huang CC, Cheng YS, Huang CK, Oyang YJ (2009) ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res 37:W396–W401

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chuanxin Z, Jiayu G, Honglin L (2013) An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinform 14:90

    Article  Google Scholar 

  • Fang Y, Guo Y, Feng Y, Li M (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34:103–109

    Article  CAS  PubMed  Google Scholar 

  • Gao M, Skolnick J (2009) A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 5:e1000567

    Article  PubMed Central  PubMed  Google Scholar 

  • Guy N, András S, Christina L, Nir BT (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387:1040–1053

    Article  Google Scholar 

  • Huang Y, Niu B, Gao Y, Limin F, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hwang S, Gou Z, Kuznetsov IB (2007) DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23:634–636

    Article  CAS  PubMed  Google Scholar 

  • Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202

    Article  CAS  PubMed  Google Scholar 

  • Jones S, Shanahan HP, Berman HM, Thornton JM (2003) Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res 31:7189–7198

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kumar M, Gromiha M, Raghava G (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463

    Article  CAS  Google Scholar 

  • Langlois RE, Lu H (2010) Boosting the prediction and understanding of DNA-binding domains from sequence. Nucleic Acids Res 38:3149–3158

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Larrañaga Pedro, Calvo Borja, Santana Roberto, Bielza Concha et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112

    Article  PubMed  Google Scholar 

  • Lin WZ, Fang JA, Xiao X, Chou KC (2011) IDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6:e24756

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Liu LA, Bradley P (2012) Atomistic modeling of protein–DNA interaction specificity: progress and applications. Curr Opin Struct Biol 22:397–405

    Article  PubMed Central  PubMed  Google Scholar 

  • Luscombe NM, Austin SE, Berman HM, Thornton JM (2000) An overview of the structures of protein–DNA complexes. Genome Biol 1:1–37

    Article  Google Scholar 

  • Mark Hall, Eib Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian Witten H (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  • Nagarajan R, Shander A, Gromiha M (2013) Novel approach for selecting best predictor for identifying binding sites in DNA binding proteins. Nucleic Acids Res 41:7606–7614

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ofran Y, Mysore V, Rost B (2007) Prediction of DNA-binding residues from sequence. Bioinformatics 23:i347–i353

    Article  CAS  PubMed  Google Scholar 

  • Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al (2000) Genome-wide location and function of DNA binding proteins. Science 290:2306–2309

    Article  CAS  PubMed  Google Scholar 

  • Sarai A, Kono H (2005) Protein–DNA recognition patterns and predictions. Annu Rev Biophys Biomol Struct 34:379–398

    Article  CAS  PubMed  Google Scholar 

  • Shao X, Tian Y, Wu L, Wang Y, Jing L, Deng N (2009) Predicting DNA-and RNA-binding proteins from sequences with kernel methods. J Theor Biol 258:289–293

    Article  CAS  PubMed  Google Scholar 

  • Si J, Zhang Z, Lin B, Schroeder M, Huang B (2011) MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst Biol 5:S7

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326:1065–1079

    Article  CAS  PubMed  Google Scholar 

  • Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3:e116

    Article  PubMed Central  PubMed  Google Scholar 

  • Tjong H, Zhou HX (2007) DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res 35:1465–1477

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243–W248

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19:1589–1591

    Article  CAS  PubMed  Google Scholar 

  • Xiong Y, Liu J, Wei DQ (2011) An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 79:509–517

    Article  CAS  PubMed  Google Scholar 

  • Yan C, Terribilini M, Wu F, Jernigan R, Dobbs D, Honavar V (2006) Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform 7:262

    Article  Google Scholar 

  • Yang L, Xia JF, Gui J (2010) Prediction of protein–protein interactions from protein sequence using local descriptors. Protein Pept Lett 17:1085–1090

    Article  CAS  PubMed  Google Scholar 

  • Zhao H, Yang Y, Zhou Y (2010) Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26:1857–1863

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zhou W, Yan H (2011) Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci 9(Suppl 1):S1. doi:10.1186/1477-5956-9-S1-S1

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgement

Authors thank Nitin Thukral for scientific discussions.

Conflict of interest

The authors declare no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasha Hasija.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samant, M., Jethva, M. & Hasija, Y. INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features. Int J Pept Res Ther 21, 189–193 (2015). https://doi.org/10.1007/s10989-014-9446-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10989-014-9446-4

Keywords

Navigation