Amino Acids

, Volume 48, Issue 7, pp 1655–1665 | Cite as

SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information

  • Xuhan Liu
  • Shiping Yang
  • Chen Li
  • Ziding ZhangEmail author
  • Jiangning SongEmail author
Original Article


Protein self-interaction, i.e. the interaction between two or more identical proteins expressed by one gene, plays an important role in the regulation of cellular functions. Considering the limitations of experimental self-interaction identification, it is necessary to design specific bioinformatics tools for self-interacting protein (SIP) prediction from protein sequence information. In this study, we proposed an improved computational approach for SIP prediction, termed SPAR (Self-interacting Protein Analysis serveR). Firstly, we developed an improved encoding scheme named critical residues substitution (CRS), in which the fine-grained domain–domain interaction information was taken into account. Then, by employing the Random Forest algorithm, the performance of CRS was evaluated and compared with several other encoding schemes commonly used for sequence-based protein–protein interaction prediction. Through the tenfold cross-validation tests on a balanced training dataset, CRS performed the best, with the average accuracy up to 72.01 %. We further integrated CRS with other encoding schemes and identified the most important features using the mRMR (the minimum redundancy maximum relevance) feature selection method. Our SPAR model with selected features achieved an average accuracy of 92.09 % on the human-independent test set (the ratio of positives to negatives was about 1:11). Besides, we also evaluated the performance of SPAR on an independent yeast test set (the ratio of positives to negatives was about 1:8) and obtained an average accuracy of 76.96 %. The results demonstrate that SPAR is capable of achieving a reasonable performance in cross-species application. The SPAR server is freely available for academic use at


Self-interacting protein Prediction Machine learning Feature selection Domain–domain interaction 



We thank Dr. Yuan Zhou at China Agricultural University for helpful discussions on this work. This work was supported by grants from the National Natural Science Foundation of China (31271414, 31471249, 61202167, 61303169).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

726_2016_2226_MOESM1_ESM.docx (24 kb)
Supplementary material 1 (DOCX 23 kb)


  1. Akiva E, Itzhaki Z, Margalit H (2008) Built-in loops allow versatility in domain–domain interactions: lessons from self-interacting domains. Proc Natl Acad Sci USA 105(36):13292–13297. doi: 10.1073/pnas.0801207105 CrossRefPubMedPubMedCentralGoogle Scholar
  2. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
  3. Baisamy L, Jurisch N, Diviani D (2005) Leucine zipper-mediated homo-oligomerization regulates the Rho-GEF activity of AKAP-Lbc. J Biol Chem 280(15):15405–15412. doi: 10.1074/jbc.M414440200 CrossRefPubMedGoogle Scholar
  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. doi: 10.1023/A:1010933404324 CrossRefGoogle Scholar
  5. Breuer K, Foroushani AK, Laird MR, Chen C, Sribnaia A, Lo R, Winsor GL, Hancock RE, Brinkman FS, Lynn DJ (2013) InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Res 41(Database issue):D1228–D1233. doi: 10.1093/nar/gks1147 CrossRefPubMedGoogle Scholar
  6. Cancherini DV, Franca GS, de Souza SJ (2010) The role of exon shuffling in shaping protein–protein interaction networks. BMC Genom 11(Suppl 5):S11. doi: 10.1186/1471-2164-11-S5-S11 CrossRefGoogle Scholar
  7. Chatr-Aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O’Donnell L, Reguly T, Nixon J, Ramage L, Winter A, Sellam A, Chang C, Hirschman J, Theesfeld C, Rust J, Livstone MS, Dolinski K, Tyers M (2015) The BioGRID interaction database: 2015 update. Nucleic Acids Res 43(Database issue):D470–D478. doi: 10.1093/nar/gku1204 CrossRefPubMedGoogle Scholar
  8. Chen Y, Dokholyan NV (2008) Natural selection against protein aggregation on self-interacting and essential proteins in yeast, fly, and worm. Mol Biol Evol 25(8):1530–1533. doi: 10.1093/molbev/msn122 CrossRefPubMedPubMedCentralGoogle Scholar
  9. Du X, Cheng J, Zheng T, Duan Z, Qian F (2014) A novel feature extraction scheme with ensemble coding for protein–protein interaction prediction. Int J Mol Sci 15(7):12731–12749. doi: 10.3390/ijms150712731 CrossRefPubMedPubMedCentralGoogle Scholar
  10. Feng ZP, Zhang CT (2000) Prediction of membrane protein types based on the hydrophobic index of amino acids. J Protein Chem 19(4):269–275CrossRefPubMedGoogle Scholar
  11. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37. doi: 10.1093/nar/gkr367 CrossRefPubMedPubMedCentralGoogle Scholar
  12. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230. doi: 10.1093/nar/gkt1223 CrossRefPubMedGoogle Scholar
  13. Gibson TA, Goldberg DS (2009) Questioning the ubiquity of neofunctionalization. PLoS Comput Biol 5(1):e1000252. doi: 10.1371/journal.pcbi.1000252 CrossRefPubMedPubMedCentralGoogle Scholar
  14. Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res 36(9):3025–3030. doi: 10.1093/nar/gkn159 CrossRefPubMedPubMedCentralGoogle Scholar
  15. Hashimoto K, Panchenko AR (2010) Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proc Natl Acad Sci USA 107(47):20352–20357. doi: 10.1073/pnas.1012999107 CrossRefPubMedPubMedCentralGoogle Scholar
  16. Hashimoto K, Nishi H, Bryant S, Panchenko AR (2011) Caught in self-interaction: evolutionary and functional mechanisms of protein homooligomerization. Phys Biol 8(3):035007. doi: 10.1088/1478-3975/8/3/035007 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Hattori T, Ohoka N, Inoue Y, Hayashi H, Onozaki K (2003) C/EBP family transcription factors are degraded by the proteasome but stabilized by forming dimer. Oncogene 22(9):1273–1280. doi: 10.1038/sj.onc.1206204 CrossRefPubMedGoogle Scholar
  18. Ispolatov I, Yuryev A, Mazo I, Maslov S (2005) Binding properties and evolution of homodimers in protein–protein interaction networks. Nucleic Acids Res 33(11):3629–3635. doi: 10.1093/nar/gki678 CrossRefPubMedPubMedCentralGoogle Scholar
  19. Katsamba P, Carroll K, Ahlsen G, Bahna F, Vendome J, Posy S, Rajebhosale M, Price S, Jessell TM, Ben-Shaul A, Shapiro L, Honig BH (2009) Linking molecular affinity and cellular specificity in cadherin-mediated adhesion. Proc Natl Acad Sci USA 106(28):11594–11599. doi: 10.1073/pnas.0905349106 CrossRefPubMedPubMedCentralGoogle Scholar
  20. Koike R, Kidera A, Ota M (2009) Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci 18(10):2060–2066. doi: 10.1002/pro.218 CrossRefPubMedPubMedCentralGoogle Scholar
  21. Launay G, Salza R, Multedo D, Thierry-Mieg N, Ricard-Blum S (2015) MatrixDB, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities. Nucleic Acids Res 43(Database issue):D321–D327. doi: 10.1093/nar/gku1091 CrossRefPubMedGoogle Scholar
  22. Liu Z, Guo F, Zhang J, Wang J, Lu L, Li D, He F (2013) Proteome-wide prediction of self-interacting proteins based on multiple properties. Mol Cell Proteomics 12(6):1689–1700. doi: 10.1074/mcp.M112.021790 CrossRefPubMedPubMedCentralGoogle Scholar
  23. Marianayagam NJ, Sunde M, Matthews JM (2004) The power of two: protein dimerization in biology. Trends Biochem Sci 29(11):618–625. doi: 10.1016/j.tibs.2004.09.006 CrossRefPubMedGoogle Scholar
  24. Miller S, Lesk AM, Janin J, Chothia C (1987) The accessible surface area and stability of oligomeric proteins. Nature 328(6133):834–836. doi: 10.1038/328834a0 CrossRefPubMedGoogle Scholar
  25. Mosca R, Ceol A, Stein A, Olivella R, Aloy P (2014) 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res 42(Database issue):D374–D379. doi: 10.1093/nar/gkt887 CrossRefPubMedGoogle Scholar
  26. Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, Chen C, del-Toro N, Duesbury M, Dumousseau M, Galeota E, Hinz U, Iannuccelli M, Jagannathan S, Jimenez R, Khadake J, Lagreid A, Licata L, Lovering RC, Meldal B, Melidoni AN, Milagros M, Peluso D, Perfetto L, Porras P, Raghunath A, Ricard-Blum S, Roechert B, Stutz A, Tognolli M, van Roey K, Cesareni G, Hermjakob H (2014) The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42(Database issue):D358–D363. doi: 10.1093/nar/gkt1115 CrossRefPubMedGoogle Scholar
  27. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830Google Scholar
  28. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. doi: 10.1109/TPAMI.2005.159 CrossRefPubMedGoogle Scholar
  29. Perez-Bercoff A, Makino T, McLysaght A (2010) Duplicability of self-interacting human genes. BMC Evol Biol 10:160. doi: 10.1186/1471-2148-10-160 CrossRefPubMedPubMedCentralGoogle Scholar
  30. Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ (2011) Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 39(Web Server issue):W385–w390. doi: 10.1093/nar/gkr284 CrossRefPubMedPubMedCentralGoogle Scholar
  31. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32(Database issue):D449–D451. doi: 10.1093/nar/gkh086 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Shatnawi M, Zaki NM (2015) Novel domain identification approach for protein–protein interaction prediction. In: Computational intelligence in bioinformatics and computational biology (CIBCB), 2015 IEEE (conference on, 12–15 Aug 2015), pp 1–8. doi: 10.1109/CIBCB.2015.7300340
  33. Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H (2007) Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci USA 104(11):4337–4341. doi: 10.1073/pnas.0607879104 CrossRefPubMedPubMedCentralGoogle Scholar
  34. UniProt C (2015) UniProt: a hub for protein information. Nucleic Acids Res 43(Database issue):D204–D212. doi: 10.1093/nar/gku989 Google Scholar
  35. Woodcock JM, Murphy J, Stomski FC, Berndt MC, Lopez AF (2003) The dimeric versus monomeric status of 14-3-3zeta is controlled by phosphorylation of Ser58 at the dimer interface. J Biol Chem 278(38):36323–36327. doi: 10.1074/jbc.M304689200 CrossRefPubMedGoogle Scholar
  36. Xia JF, Han K, Huang DS (2010) Sequence-based prediction of protein–protein interactions by means of rotation forest and autocorrelation descriptor. Protein Pept Lett 17(1):137–145CrossRefPubMedGoogle Scholar
  37. Yang L, Xia JF, Gui J (2010) Prediction of protein–protein interactions from protein sequence using local descriptors. Protein Pept Lett 17(9):1085–1090CrossRefPubMedGoogle Scholar
  38. You ZH, Lei YK, Zhu L, Xia J, Wang B (2013) Prediction of protein–protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinform 14(Suppl 8):S10. doi: 10.1186/1471-2105-14-S8-S10 CrossRefGoogle Scholar
  39. Zahiri J, Yaghoubi O, Mohammad-Noori M, Ebrahimpour R, Masoudi-Nejad A (2013) PPIevo: protein–protein interaction prediction from PSSM based evolutionary information. Genomics 102(4):237–242. doi: 10.1016/j.ygeno.2013.05.006 CrossRefPubMedGoogle Scholar
  40. Zahiri J, Mohammad-Noori M, Ebrahimpour R, Saadat S, Bozorgmehr JH, Goldberg T, Masoudi-Nejad A (2014) LocFuse: human protein–protein interaction prediction via classifier fusion using protein localization information. Genomics 104(6 Pt B):496–503. doi: 10.1016/j.ygeno.2014.10.006 CrossRefPubMedGoogle Scholar
  41. Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P (2009) Protein–protein interaction based on pairwise similarity. BMC Bioinform 10:150. doi: 10.1186/1471-2105-10-150 CrossRefGoogle Scholar
  42. Zhou Y, Zhou YS, He F, Song J, Zhang Z (2012) Can simple codon pair usage predict protein–protein interaction? Mol BioSyst 8(5):1396–1404. doi: 10.1039/c2mb05427b CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.State Key Laboratory of Agrobiotechnology, College of Biological SciencesChina Agricultural UniversityBeijingChina
  2. 2.Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular BiologyMonash UniversityMelbourneAustralia
  3. 3.Monash Centre for Data Science, Faculty of Information TechnologyMonash UniversityMelbourneAustralia
  4. 4.National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial BiotechnologyChinese Academy of SciencesTianjinChina

Personalised recommendations