Advertisement

StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence

  • Michael Flot
  • Avdesh Mishra
  • Aditi Sharma Kuchi
  • Md Tamjidul HoqueEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1958)

Abstract

Supersecondary structure (SSS) refers to specific geometric arrangements of several secondary structure (SS) elements that are connected by loops. The SSS can provide useful information about the spatial structure and function of a protein. As such, the SSS is a bridge between the secondary structure and tertiary structure. In this chapter, we propose a stacking-based machine learning method for the prediction of two types of SSSs, namely, β-hairpins and β-α-β, from the protein sequence based on comprehensive feature encoding. To encode protein residues, we utilize key features such as solvent accessibility, conservation profile, half surface exposure, torsion angle fluctuation, disorder probabilities, and more. The usefulness of the proposed approach is assessed using a widely used threefold cross-validation technique. The obtained empirical result shows that the proposed approach is useful and prediction can be improved further.

Key words

Supersecondary structure prediction Beta-hairpins Beta-alpha-beta Stacking Machine learning Sequence-based prediction 

Notes

Acknowledgment

The authors gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund LEQSF (2016-19)-RD-B-07.

References

  1. 1.
    Chen K, Kurgan L (2012) Computational prediction of secondary and supersecondary structures. In: Kister A (ed) Protein supersecondary structures, vol 932. Humana Press, Totowa, NJCrossRefGoogle Scholar
  2. 2.
    Sun L, Hu X, Li S, Jiang Z, Li K (2016) Prediction of complex super-secondary structure βαβ motifs based on combined features. Saudi J Biol Sci 23(1):66–71PubMedCrossRefPubMedCentralGoogle Scholar
  3. 3.
    Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294(5540):93–96PubMedCrossRefPubMedCentralGoogle Scholar
  4. 4.
    Skolnick J, Fetrow JS, Kolinski A (2000) Structural genomics and its importance for gene function analysis. Nat Biotechnol 18:283–287PubMedCrossRefPubMedCentralGoogle Scholar
  5. 5.
    Bhattacharya D, Cao R, Cheng J (2016) UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 32(18):2791–2799PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Bhattacharya D, Cheng J (2013) i3Drefine software for protein 3D structure refinement and its assessment in CASP10. PLoS One 8(7):e69648PubMedPubMedCentralCrossRefGoogle Scholar
  7. 7.
    Bradley P, Misura KMS, Baker D (2005) Toward high-resolution de novo structure prediction for small proteins. Science 309(5742):1868–1871PubMedCrossRefGoogle Scholar
  8. 8.
    Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J (2015) Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics 31(12):i116–i123PubMedPubMedCentralCrossRefGoogle Scholar
  9. 9.
    Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins 69(S8):57–67PubMedCrossRefPubMedCentralGoogle Scholar
  10. 10.
    Klepeis JL, Wei Y, Hecht MH, Floudas CA (2005) Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins 58(3):560–570PubMedCrossRefPubMedCentralGoogle Scholar
  11. 11.
    Liwo A, Khalili M, Scheraga HA (2005) Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci U S A 102(7):2362–2367PubMedPubMedCentralCrossRefGoogle Scholar
  12. 12.
    Wu S, Skolnick J, Zhang Y (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 5:17PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    He X, Zhu Y, Epstein A, Mo Y (2018) Statistical variances of diffusional properties from ab initio molecular dynamics simulations. npj Comput Mater 4(1):18.  https://doi.org/10.1038/s41524-018-0074-yCrossRefGoogle Scholar
  14. 14.
    Magnan CN, Baldi P (2015) Molecular dynamics simulations advances and applications. Adv Appl Bioinforma Chem 8:37–47Google Scholar
  15. 15.
    Ginalski K, Pas J, Wyrwicz LS, Mv G, Bujnicki JM, Rychlewskia L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287(4):797–815PubMedCrossRefPubMedCentralGoogle Scholar
  17. 17.
    Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856PubMedCrossRefPubMedCentralGoogle Scholar
  18. 18.
    Skolnick J, Kihara D, Zhang Y (2004) Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm. Proteins 56:502–518PubMedCrossRefPubMedCentralGoogle Scholar
  19. 19.
    Wu S, Zhang Y (2008) MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72(2):547–556PubMedPubMedCentralCrossRefGoogle Scholar
  20. 20.
    Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15):2076–2082PubMedPubMedCentralCrossRefGoogle Scholar
  21. 21.
    Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y (2015) The I-TASSER Suite: protein structure and function prediction. Nat Methods 12:7–8PubMedPubMedCentralCrossRefGoogle Scholar
  22. 22.
    Faraggi E, Yang Y, Zhang S, Zhou Y (2010) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17(11):1515–1527CrossRefGoogle Scholar
  23. 23.
    Szilágyi A, Skolnick J (2006) Efficient prediction of nucleic acid binding function from low-resolution protein Structures. J Mol Biol 358(3):922–933PubMedCrossRefGoogle Scholar
  24. 24.
    Zhou H, Skolnick J (2007) Ab initio protein structure prediction using chunk-TASSER. Biophys J 93(5):1510–1518PubMedPubMedCentralCrossRefGoogle Scholar
  25. 25.
    Magnan CN, Baldi P (2014) SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30(18):2592–2597PubMedPubMedCentralCrossRefGoogle Scholar
  26. 26.
    Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers, and solvent accessibility. Bioinformatics 33(18):2842–2849PubMedCrossRefPubMedCentralGoogle Scholar
  27. 27.
    Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476PubMedPubMedCentralCrossRefGoogle Scholar
  28. 28.
    Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267PubMedCrossRefPubMedCentralGoogle Scholar
  29. 29.
    Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862PubMedGoogle Scholar
  30. 30.
    Chowdhury SY, Shatabda S, Dehzangi A (2017) iDNAProt-ES: identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 7:14938PubMedPubMedCentralCrossRefGoogle Scholar
  31. 31.
    Iqbal S, Hoque MT (2018) PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence. Bioinformatics 34(19):3289–3299PubMedCrossRefPubMedCentralGoogle Scholar
  32. 32.
    Taherzadeh G, Zhou Y, Liew AW-C, Yang Y (2016) Sequence-based prediction of protein-carbohydrate binding sites using support vector machines. J Chem Inf Model 56(10):2115–2122PubMedCrossRefPubMedCentralGoogle Scholar
  33. 33.
    Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28(23):3066–3072PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Iqbal S, Hoque MT (2015) DisPredict: a predictor of disordered protein using optimized RBF kernel. PLoS One 10(10):e0141551PubMedPubMedCentralCrossRefGoogle Scholar
  35. 35.
    Iqbal S, Hoque MT (2016) Estimation of position specific energy as a feature of protein residues from sequence alone for structural classification. PLoS One 11(9):e0161452PubMedPubMedCentralCrossRefGoogle Scholar
  36. 36.
    Iqbal S, Mishra A, Hoque T (2015) Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 380:380–391PubMedCrossRefPubMedCentralGoogle Scholar
  37. 37.
    Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33PubMedPubMedCentralCrossRefGoogle Scholar
  38. 38.
    Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405PubMedCrossRefPubMedCentralGoogle Scholar
  39. 39.
    Jia S-C, Hu X-Z (2011) Using random forest algorithm to predict β-hairpin motifs. Protein Pept Lett 18(6):609–617PubMedCrossRefPubMedCentralGoogle Scholar
  40. 40.
    Hu X-Z, Li Q-Z, Wang C-L (2010) Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 38(3):915–921PubMedCrossRefPubMedCentralGoogle Scholar
  41. 41.
    Sun L, Hu X (2013) Recognition of beta-alpha-beta motifs in proteins by using Random Forest algorithm. Paper presented at the sixth International Conference on Biomedical Engineering and Informatics, Hangzhou, ChinaGoogle Scholar
  42. 42.
    Mahrenholz CC, Abfalter IG, Bodenhofer U, Volkmer R, Hochreiter S (2011) Complex networks govern coiled-coil oligomerization—predicting and profiling by means of a machine learning approach. Mol Cell Proteomics 10(5):M110.004994PubMedPubMedCentralCrossRefGoogle Scholar
  43. 43.
    Bartoli L, Fariselli P, Krogh A, Casadio R (2009) CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information. Bioinformatics 25(21):2757–2763PubMedCrossRefPubMedCentralGoogle Scholar
  44. 44.
    Pellegrini-Calace M, Thornton JM (2005) Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information. Nucleic Acids Res 33(7):2129–2140PubMedPubMedCentralCrossRefGoogle Scholar
  45. 45.
    Dodd IB, Egan JB (1990) Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res 18(17):5019–5026PubMedPubMedCentralCrossRefGoogle Scholar
  46. 46.
    Ferrer-Costa C, Shanahan HP, Jones S, Thornton JM (2005) HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics 21(18):3679–3680PubMedCrossRefPubMedCentralGoogle Scholar
  47. 47.
    Kumar M, Bhasin M, Natt NK, Raghava GPS (2005) BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 33(Web Server issue):W154–W159PubMedPubMedCentralCrossRefGoogle Scholar
  48. 48.
    Sun ZR, Cui Y, Ling LJ, Guo Q, Chen RS (1998) Molecular dynamics simulation of protein folding with supersecondary structure constraints. J Protein Chem 17(8):765–769PubMedCrossRefPubMedCentralGoogle Scholar
  49. 49.
    Szappanos B, Süveges D, Nyitray L, Perczel A, Gáspári Z (2010) Folded-unfolded cross-predictions and protein evolution: the case study of coiled-coils. FEBS Lett 584(8):1623–1627PubMedCrossRefPubMedCentralGoogle Scholar
  50. 50.
    O’Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B (2011) A method for probing the mutational landscape of amyloid structure. Bioinformatics 27(13):i34–i42PubMedPubMedCentralCrossRefGoogle Scholar
  51. 51.
    Rackham OJL, Madera M, Armstrong CT, Vincent TL, Woolfson DN, Gough J (2010) The evolution and structure prediction of coiled coils across all genomes. J Mol Biol 403(3):480–493PubMedCrossRefPubMedCentralGoogle Scholar
  52. 52.
    Gerstein M, Hegyi H (1998) Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 22(4):277–304PubMedCrossRefPubMedCentralGoogle Scholar
  53. 53.
    Reddy CC, Shameer K, Offmann BO, Sowdhamini R (2008) PURE: a webserver for the prediction of domains in unassigned regions in proteins. BMC Bioinformatics 9:281PubMedPubMedCentralCrossRefGoogle Scholar
  54. 54.
    Mishra A, Pokhrel P, Hoque MT (2018) StackDPPred: a stacking based prediction of DNA-binding protein from sequence. http://cs.uno.edu/~tamjid/TechReport/StackDPPred_TR2018_2.pdf
  55. 55.
    Flot M, Mishra A, Kuchi AS, Hoque MT (2018) Benchmark data for supersecondary structure prediction only from sequence. University of New Orleans. http://cs.uno.edu/~tamjid/Software/StackSSSPred/code_data.zip. Accessed June 2018
  56. 56.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242PubMedPubMedCentralCrossRefGoogle Scholar
  57. 57.
    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421PubMedPubMedCentralCrossRefGoogle Scholar
  58. 58.
    Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM (1987) Knowledge-based prediction of protein structures and the design of novel molecules. Nature 326:347–352PubMedCrossRefPubMedCentralGoogle Scholar
  59. 59.
    Wierenga RK, Terpstra P, Hol WG (1986) Prediction of the occurrence of the ADP-binding βαβ-fold in proteins, using an amino acid sequence fingerprint. J Mol Biol 187(1):101–107PubMedCrossRefPubMedCentralGoogle Scholar
  60. 60.
    Hutchinson EG, Thornton JM (1996) PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Sci 5(2):212–220PubMedPubMedCentralCrossRefGoogle Scholar
  61. 61.
    Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637CrossRefGoogle Scholar
  62. 62.
    Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. J Mol Model 7:360–369CrossRefGoogle Scholar
  63. 63.
    Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinformatics 11:273PubMedPubMedCentralCrossRefGoogle Scholar
  64. 64.
    Islam N, Iqbal S, Katebi AR, Hoque MT (2016) A balanced secondary structure predictor. J Theor Biol 389:60–71CrossRefGoogle Scholar
  65. 65.
    Kumar M, Gromiha MM, Raghava GP (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 8:463PubMedPubMedCentralCrossRefGoogle Scholar
  66. 66.
    Verma R, Varshney GC, Raghava GPS (2010) Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 39(1):101–110PubMedCrossRefGoogle Scholar
  67. 67.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefGoogle Scholar
  68. 68.
    Paliwal KK, Sharma A, Lyons J, Dehzangi A (2014) A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobioscience 13(1):44–50PubMedCrossRefPubMedCentralGoogle Scholar
  69. 69.
    Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46PubMedCrossRefPubMedCentralGoogle Scholar
  70. 70.
    Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from NMR-determined structures and their prediction. Proteins 78:3353–3362PubMedPubMedCentralCrossRefGoogle Scholar
  71. 71.
    London N, Movshovitz-Attias D, Schueler-Furman O (2010) The structural basis of peptide-protein binding strategies. Structure 18(2):188–199PubMedCrossRefPubMedCentralGoogle Scholar
  72. 72.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830Google Scholar
  73. 73.
    Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185Google Scholar
  74. 74.
    Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42CrossRefGoogle Scholar
  75. 75.
    Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378.  https://doi.org/10.1016/S0167-9473(01)00065-2CrossRefGoogle Scholar
  76. 76.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, Springer series in statics, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  77. 77.
    Freedma DA (2009) Statistical models: theory and practice. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  78. 78.
    Ho TK (1995) Random decision forests. Paper presented at the Document Analysis and Recognition, 1995. Proceedings of the Third International Conference, Montreal, Quebec, CanadaGoogle Scholar
  79. 79.
    Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken, NJGoogle Scholar
  80. 80.
    Bishop C (2009) Pattern recognition and machine learning. Information science and statistics. Springer, New YorkGoogle Scholar
  81. 81.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259CrossRefGoogle Scholar
  82. 82.
    Frank E, Hall M, Trigg L, Holmes G, Witten IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20(15):2479–2481PubMedCrossRefPubMedCentralGoogle Scholar
  83. 83.
    Ginsburga GS, McCarthyb JJ (2001) Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol 19(12):491–496CrossRefGoogle Scholar
  84. 84.
    Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform 2(3):159–173CrossRefGoogle Scholar
  85. 85.
    Hu Q, Merchante C, Stepanova AN, Alonso JM, Heber S (2015) A stacking-based approach to identify translated upstream open reading frames in Arabidopsis Thaliana. Paper presented at the International Symposium on Bioinformatics Research and ApplicationsGoogle Scholar
  86. 86.
    Verma A, Mehta S (2017) A comparative study of ensemble learning methods for classification in bioinformatics. Paper presented at the seventh International Conference on Cloud Computing, Data Science & Engineering—Confluence, Noida, IndiaGoogle Scholar
  87. 87.
    Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67–82CrossRefGoogle Scholar
  88. 88.
    Frank E, Hall M, Trigg L, Holmes G, Written IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20:2479–2481PubMedCrossRefPubMedCentralGoogle Scholar
  89. 89.
    Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y (2018) B-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 39:407–411PubMedCrossRefPubMedCentralGoogle Scholar
  90. 90.
    Anne C, Mishra A, Hoque MT, Tu S (2018) Multiclass patent document classification. Artif Intell Res 7(1):1CrossRefGoogle Scholar
  91. 91.
    Heinig M, Frishman D (2004) STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 32(Web Server issue):W500–W502PubMedPubMedCentralCrossRefGoogle Scholar
  92. 92.
    Martin J, Letellier G, Marin A, Taly J-F, AGD B, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Michael Flot
    • 1
  • Avdesh Mishra
    • 1
  • Aditi Sharma Kuchi
    • 1
  • Md Tamjidul Hoque
    • 1
    Email author
  1. 1.Department of Computer ScienceUniversity of New OrleansNew OrleansUSA

Personalised recommendations