INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features

  • Monika Samant
  • Minesh Jethva
  • Yasha Hasija


Protein–DNA interactions carry out many important regulatory functions in our body which are essential for growth and survival. The molecular level understanding of these interactions helps us to decipher the mechanism of regulation. Conventionally these interactions were identified using small scale experimental techniques and high throughput technologies. But these approaches being time consuming and expensive, arouse the need of a computational approach for the prediction of interactions. To this end, a machine learning approach for predicting and classifying DNA-binding proteins has been used. Weka and LibSVM platform have been used to develop an effective classifying model. The different classifiers were applied on the attributes generated from protein sequence and tried to develop an efficient model for classifying a protein as DNA-interacting or DNA-non-interacting. Among several classifying algorithms applied to generate models, best performance was achieved using LibSVM with 87.83 % accuracy. The tool named INTERACT-O-FINDER, based on the prediction model is available at


DNA Attributes Classifiers Algorithms LibSVM 



Authors thank Nitin Thukral for scientific discussions.

Conflict of interest

The authors declare no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


  1. Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341:65–71CrossRefPubMedGoogle Scholar
  2. András S, Jeffrey S (2006) Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol 358:922–933CrossRefGoogle Scholar
  3. Andrea S, Ondřej K, Filip Ž, Jakub T (2012) Prediction of DNA-binding proteins from relational features. Proteome Sci 10:66CrossRefGoogle Scholar
  4. Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res 33:6486–6493CrossRefPubMedCentralPubMedGoogle Scholar
  5. Blouin S, Craggs TD, Lafontaine DA, Penedo JC (2009) Functional studies of DNA–protein interactions using FRET techniques. Methods Mol Biol 543:475–502CrossRefPubMedGoogle Scholar
  6. Caragea C, Caragea D, Silvescu A, Honavar V (2010) Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models. BMC Bioinform 11:S6CrossRefGoogle Scholar
  7. Chang CC and Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27Google Scholar
  8. Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor Newsl 6:1–6CrossRefGoogle Scholar
  9. Chu WY, Huang YF, Huang CC, Cheng YS, Huang CK, Oyang YJ (2009) ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res 37:W396–W401CrossRefPubMedCentralPubMedGoogle Scholar
  10. Chuanxin Z, Jiayu G, Honglin L (2013) An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinform 14:90CrossRefGoogle Scholar
  11. Fang Y, Guo Y, Feng Y, Li M (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34:103–109CrossRefPubMedGoogle Scholar
  12. Gao M, Skolnick J (2009) A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 5:e1000567CrossRefPubMedCentralPubMedGoogle Scholar
  13. Guy N, András S, Christina L, Nir BT (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387:1040–1053CrossRefGoogle Scholar
  14. Huang Y, Niu B, Gao Y, Limin F, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682CrossRefPubMedCentralPubMedGoogle Scholar
  15. Hwang S, Gou Z, Kuznetsov IB (2007) DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23:634–636CrossRefPubMedGoogle Scholar
  16. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202CrossRefPubMedGoogle Scholar
  17. Jones S, Shanahan HP, Berman HM, Thornton JM (2003) Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res 31:7189–7198CrossRefPubMedCentralPubMedGoogle Scholar
  18. Kumar M, Gromiha M, Raghava G (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463CrossRefGoogle Scholar
  19. Langlois RE, Lu H (2010) Boosting the prediction and understanding of DNA-binding domains from sequence. Nucleic Acids Res 38:3149–3158CrossRefPubMedCentralPubMedGoogle Scholar
  20. Larrañaga Pedro, Calvo Borja, Santana Roberto, Bielza Concha et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112CrossRefPubMedGoogle Scholar
  21. Lin WZ, Fang JA, Xiao X, Chou KC (2011) IDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6:e24756CrossRefPubMedCentralPubMedGoogle Scholar
  22. Liu LA, Bradley P (2012) Atomistic modeling of protein–DNA interaction specificity: progress and applications. Curr Opin Struct Biol 22:397–405CrossRefPubMedCentralPubMedGoogle Scholar
  23. Luscombe NM, Austin SE, Berman HM, Thornton JM (2000) An overview of the structures of protein–DNA complexes. Genome Biol 1:1–37CrossRefGoogle Scholar
  24. Mark Hall, Eib Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian Witten H (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11:10–18CrossRefGoogle Scholar
  25. Nagarajan R, Shander A, Gromiha M (2013) Novel approach for selecting best predictor for identifying binding sites in DNA binding proteins. Nucleic Acids Res 41:7606–7614CrossRefPubMedCentralPubMedGoogle Scholar
  26. Ofran Y, Mysore V, Rost B (2007) Prediction of DNA-binding residues from sequence. Bioinformatics 23:i347–i353CrossRefPubMedGoogle Scholar
  27. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al (2000) Genome-wide location and function of DNA binding proteins. Science 290:2306–2309CrossRefPubMedGoogle Scholar
  28. Sarai A, Kono H (2005) Protein–DNA recognition patterns and predictions. Annu Rev Biophys Biomol Struct 34:379–398CrossRefPubMedGoogle Scholar
  29. Shao X, Tian Y, Wu L, Wang Y, Jing L, Deng N (2009) Predicting DNA-and RNA-binding proteins from sequences with kernel methods. J Theor Biol 258:289–293CrossRefPubMedGoogle Scholar
  30. Si J, Zhang Z, Lin B, Schroeder M, Huang B (2011) MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst Biol 5:S7CrossRefPubMedCentralPubMedGoogle Scholar
  31. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326:1065–1079CrossRefPubMedGoogle Scholar
  32. Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3:e116CrossRefPubMedCentralPubMedGoogle Scholar
  33. Tjong H, Zhou HX (2007) DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res 35:1465–1477CrossRefPubMedCentralPubMedGoogle Scholar
  34. Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243–W248CrossRefPubMedCentralPubMedGoogle Scholar
  35. Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19:1589–1591CrossRefPubMedGoogle Scholar
  36. Xiong Y, Liu J, Wei DQ (2011) An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 79:509–517CrossRefPubMedGoogle Scholar
  37. Yan C, Terribilini M, Wu F, Jernigan R, Dobbs D, Honavar V (2006) Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform 7:262CrossRefGoogle Scholar
  38. Yang L, Xia JF, Gui J (2010) Prediction of protein–protein interactions from protein sequence using local descriptors. Protein Pept Lett 17:1085–1090CrossRefPubMedGoogle Scholar
  39. Zhao H, Yang Y, Zhou Y (2010) Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26:1857–1863CrossRefPubMedCentralPubMedGoogle Scholar
  40. Zhou W, Yan H (2011) Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci 9(Suppl 1):S1. doi: 10.1186/1477-5956-9-S1-S1 CrossRefPubMedCentralPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of BiotechnologyDelhi Technological UniversityDelhiIndia
  2. 2.Synthetic Biology and BiofuelsInternational Center for Genetic Engineering and BiotechnologyDelhiIndia

Personalised recommendations