Abstract
Protein–DNA interactions carry out many important regulatory functions in our body which are essential for growth and survival. The molecular level understanding of these interactions helps us to decipher the mechanism of regulation. Conventionally these interactions were identified using small scale experimental techniques and high throughput technologies. But these approaches being time consuming and expensive, arouse the need of a computational approach for the prediction of interactions. To this end, a machine learning approach for predicting and classifying DNA-binding proteins has been used. Weka and LibSVM platform have been used to develop an effective classifying model. The different classifiers were applied on the attributes generated from protein sequence and tried to develop an efficient model for classifying a protein as DNA-interacting or DNA-non-interacting. Among several classifying algorithms applied to generate models, best performance was achieved using LibSVM with 87.83 % accuracy. The tool named INTERACT-O-FINDER, based on the prediction model is available at http://interacto.eurekanow.org/index.html.
Similar content being viewed by others
References
Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341:65–71
András S, Jeffrey S (2006) Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol 358:922–933
Andrea S, Ondřej K, Filip Ž, Jakub T (2012) Prediction of DNA-binding proteins from relational features. Proteome Sci 10:66
Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res 33:6486–6493
Blouin S, Craggs TD, Lafontaine DA, Penedo JC (2009) Functional studies of DNA–protein interactions using FRET techniques. Methods Mol Biol 543:475–502
Caragea C, Caragea D, Silvescu A, Honavar V (2010) Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models. BMC Bioinform 11:S6
Chang CC and Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor Newsl 6:1–6
Chu WY, Huang YF, Huang CC, Cheng YS, Huang CK, Oyang YJ (2009) ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res 37:W396–W401
Chuanxin Z, Jiayu G, Honglin L (2013) An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinform 14:90
Fang Y, Guo Y, Feng Y, Li M (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34:103–109
Gao M, Skolnick J (2009) A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 5:e1000567
Guy N, András S, Christina L, Nir BT (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387:1040–1053
Huang Y, Niu B, Gao Y, Limin F, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682
Hwang S, Gou Z, Kuznetsov IB (2007) DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23:634–636
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Jones S, Shanahan HP, Berman HM, Thornton JM (2003) Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res 31:7189–7198
Kumar M, Gromiha M, Raghava G (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463
Langlois RE, Lu H (2010) Boosting the prediction and understanding of DNA-binding domains from sequence. Nucleic Acids Res 38:3149–3158
Larrañaga Pedro, Calvo Borja, Santana Roberto, Bielza Concha et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112
Lin WZ, Fang JA, Xiao X, Chou KC (2011) IDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6:e24756
Liu LA, Bradley P (2012) Atomistic modeling of protein–DNA interaction specificity: progress and applications. Curr Opin Struct Biol 22:397–405
Luscombe NM, Austin SE, Berman HM, Thornton JM (2000) An overview of the structures of protein–DNA complexes. Genome Biol 1:1–37
Mark Hall, Eib Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian Witten H (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11:10–18
Nagarajan R, Shander A, Gromiha M (2013) Novel approach for selecting best predictor for identifying binding sites in DNA binding proteins. Nucleic Acids Res 41:7606–7614
Ofran Y, Mysore V, Rost B (2007) Prediction of DNA-binding residues from sequence. Bioinformatics 23:i347–i353
Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al (2000) Genome-wide location and function of DNA binding proteins. Science 290:2306–2309
Sarai A, Kono H (2005) Protein–DNA recognition patterns and predictions. Annu Rev Biophys Biomol Struct 34:379–398
Shao X, Tian Y, Wu L, Wang Y, Jing L, Deng N (2009) Predicting DNA-and RNA-binding proteins from sequences with kernel methods. J Theor Biol 258:289–293
Si J, Zhang Z, Lin B, Schroeder M, Huang B (2011) MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst Biol 5:S7
Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326:1065–1079
Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3:e116
Tjong H, Zhou HX (2007) DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res 35:1465–1477
Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243–W248
Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19:1589–1591
Xiong Y, Liu J, Wei DQ (2011) An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 79:509–517
Yan C, Terribilini M, Wu F, Jernigan R, Dobbs D, Honavar V (2006) Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform 7:262
Yang L, Xia JF, Gui J (2010) Prediction of protein–protein interactions from protein sequence using local descriptors. Protein Pept Lett 17:1085–1090
Zhao H, Yang Y, Zhou Y (2010) Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26:1857–1863
Zhou W, Yan H (2011) Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci 9(Suppl 1):S1. doi:10.1186/1477-5956-9-S1-S1
Acknowledgement
Authors thank Nitin Thukral for scientific discussions.
Conflict of interest
The authors declare no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Samant, M., Jethva, M. & Hasija, Y. INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features. Int J Pept Res Ther 21, 189–193 (2015). https://doi.org/10.1007/s10989-014-9446-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10989-014-9446-4