Skip to main content

StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence

  • Protocol
  • First Online:
Book cover Protein Supersecondary Structures

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1958))

Abstract

Supersecondary structure (SSS) refers to specific geometric arrangements of several secondary structure (SS) elements that are connected by loops. The SSS can provide useful information about the spatial structure and function of a protein. As such, the SSS is a bridge between the secondary structure and tertiary structure. In this chapter, we propose a stacking-based machine learning method for the prediction of two types of SSSs, namely, β-hairpins and β-α-β, from the protein sequence based on comprehensive feature encoding. To encode protein residues, we utilize key features such as solvent accessibility, conservation profile, half surface exposure, torsion angle fluctuation, disorder probabilities, and more. The usefulness of the proposed approach is assessed using a widely used threefold cross-validation technique. The obtained empirical result shows that the proposed approach is useful and prediction can be improved further.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen K, Kurgan L (2012) Computational prediction of secondary and supersecondary structures. In: Kister A (ed) Protein supersecondary structures, vol 932. Humana Press, Totowa, NJ

    Chapter  Google Scholar 

  2. Sun L, Hu X, Li S, Jiang Z, Li K (2016) Prediction of complex super-secondary structure βαβ motifs based on combined features. Saudi J Biol Sci 23(1):66–71

    Article  CAS  PubMed  Google Scholar 

  3. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294(5540):93–96

    Article  CAS  PubMed  Google Scholar 

  4. Skolnick J, Fetrow JS, Kolinski A (2000) Structural genomics and its importance for gene function analysis. Nat Biotechnol 18:283–287

    Article  CAS  PubMed  Google Scholar 

  5. Bhattacharya D, Cao R, Cheng J (2016) UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 32(18):2791–2799

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bhattacharya D, Cheng J (2013) i3Drefine software for protein 3D structure refinement and its assessment in CASP10. PLoS One 8(7):e69648

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bradley P, Misura KMS, Baker D (2005) Toward high-resolution de novo structure prediction for small proteins. Science 309(5742):1868–1871

    Article  CAS  PubMed  Google Scholar 

  8. Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J (2015) Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics 31(12):i116–i123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins 69(S8):57–67

    Article  CAS  PubMed  Google Scholar 

  10. Klepeis JL, Wei Y, Hecht MH, Floudas CA (2005) Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins 58(3):560–570

    Article  CAS  PubMed  Google Scholar 

  11. Liwo A, Khalili M, Scheraga HA (2005) Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci U S A 102(7):2362–2367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wu S, Skolnick J, Zhang Y (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 5:17

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. He X, Zhu Y, Epstein A, Mo Y (2018) Statistical variances of diffusional properties from ab initio molecular dynamics simulations. npj Comput Mater 4(1):18. https://doi.org/10.1038/s41524-018-0074-y

    Article  CAS  Google Scholar 

  14. Magnan CN, Baldi P (2015) Molecular dynamics simulations advances and applications. Adv Appl Bioinforma Chem 8:37–47

    Google Scholar 

  15. Ginalski K, Pas J, Wyrwicz LS, Mv G, Bujnicki JM, Rychlewskia L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287(4):797–815

    Article  CAS  PubMed  Google Scholar 

  17. Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856

    Article  CAS  PubMed  Google Scholar 

  18. Skolnick J, Kihara D, Zhang Y (2004) Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm. Proteins 56:502–518

    Article  CAS  PubMed  Google Scholar 

  19. Wu S, Zhang Y (2008) MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72(2):547–556

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15):2076–2082

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y (2015) The I-TASSER Suite: protein structure and function prediction. Nat Methods 12:7–8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Faraggi E, Yang Y, Zhang S, Zhou Y (2010) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17(11):1515–1527

    Article  CAS  Google Scholar 

  23. Szilágyi A, Skolnick J (2006) Efficient prediction of nucleic acid binding function from low-resolution protein Structures. J Mol Biol 358(3):922–933

    Article  PubMed  CAS  Google Scholar 

  24. Zhou H, Skolnick J (2007) Ab initio protein structure prediction using chunk-TASSER. Biophys J 93(5):1510–1518

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Magnan CN, Baldi P (2014) SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30(18):2592–2597

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers, and solvent accessibility. Bioinformatics 33(18):2842–2849

    Article  CAS  PubMed  Google Scholar 

  27. Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476

    Article  PubMed  PubMed Central  Google Scholar 

  28. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267

    Article  CAS  PubMed  Google Scholar 

  29. Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862

    CAS  PubMed  Google Scholar 

  30. Chowdhury SY, Shatabda S, Dehzangi A (2017) iDNAProt-ES: identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 7:14938

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Iqbal S, Hoque MT (2018) PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence. Bioinformatics 34(19):3289–3299

    Article  PubMed  Google Scholar 

  32. Taherzadeh G, Zhou Y, Liew AW-C, Yang Y (2016) Sequence-based prediction of protein-carbohydrate binding sites using support vector machines. J Chem Inf Model 56(10):2115–2122

    Article  CAS  PubMed  Google Scholar 

  33. Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28(23):3066–3072

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Iqbal S, Hoque MT (2015) DisPredict: a predictor of disordered protein using optimized RBF kernel. PLoS One 10(10):e0141551

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Iqbal S, Hoque MT (2016) Estimation of position specific energy as a feature of protein residues from sequence alone for structural classification. PLoS One 11(9):e0161452

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Iqbal S, Mishra A, Hoque T (2015) Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 380:380–391

    Article  PubMed  Google Scholar 

  37. Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405

    Article  CAS  PubMed  Google Scholar 

  39. Jia S-C, Hu X-Z (2011) Using random forest algorithm to predict β-hairpin motifs. Protein Pept Lett 18(6):609–617

    Article  CAS  PubMed  Google Scholar 

  40. Hu X-Z, Li Q-Z, Wang C-L (2010) Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 38(3):915–921

    Article  CAS  PubMed  Google Scholar 

  41. Sun L, Hu X (2013) Recognition of beta-alpha-beta motifs in proteins by using Random Forest algorithm. Paper presented at the sixth International Conference on Biomedical Engineering and Informatics, Hangzhou, China

    Google Scholar 

  42. Mahrenholz CC, Abfalter IG, Bodenhofer U, Volkmer R, Hochreiter S (2011) Complex networks govern coiled-coil oligomerization—predicting and profiling by means of a machine learning approach. Mol Cell Proteomics 10(5):M110.004994

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Bartoli L, Fariselli P, Krogh A, Casadio R (2009) CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information. Bioinformatics 25(21):2757–2763

    Article  CAS  PubMed  Google Scholar 

  44. Pellegrini-Calace M, Thornton JM (2005) Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information. Nucleic Acids Res 33(7):2129–2140

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Dodd IB, Egan JB (1990) Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res 18(17):5019–5026

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Ferrer-Costa C, Shanahan HP, Jones S, Thornton JM (2005) HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics 21(18):3679–3680

    Article  CAS  PubMed  Google Scholar 

  47. Kumar M, Bhasin M, Natt NK, Raghava GPS (2005) BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 33(Web Server issue):W154–W159

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Sun ZR, Cui Y, Ling LJ, Guo Q, Chen RS (1998) Molecular dynamics simulation of protein folding with supersecondary structure constraints. J Protein Chem 17(8):765–769

    Article  CAS  PubMed  Google Scholar 

  49. Szappanos B, Süveges D, Nyitray L, Perczel A, Gáspári Z (2010) Folded-unfolded cross-predictions and protein evolution: the case study of coiled-coils. FEBS Lett 584(8):1623–1627

    Article  CAS  PubMed  Google Scholar 

  50. O’Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B (2011) A method for probing the mutational landscape of amyloid structure. Bioinformatics 27(13):i34–i42

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Rackham OJL, Madera M, Armstrong CT, Vincent TL, Woolfson DN, Gough J (2010) The evolution and structure prediction of coiled coils across all genomes. J Mol Biol 403(3):480–493

    Article  CAS  PubMed  Google Scholar 

  52. Gerstein M, Hegyi H (1998) Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 22(4):277–304

    Article  CAS  PubMed  Google Scholar 

  53. Reddy CC, Shameer K, Offmann BO, Sowdhamini R (2008) PURE: a webserver for the prediction of domains in unassigned regions in proteins. BMC Bioinformatics 9:281

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Mishra A, Pokhrel P, Hoque MT (2018) StackDPPred: a stacking based prediction of DNA-binding protein from sequence. http://cs.uno.edu/~tamjid/TechReport/StackDPPred_TR2018_2.pdf

  55. Flot M, Mishra A, Kuchi AS, Hoque MT (2018) Benchmark data for supersecondary structure prediction only from sequence. University of New Orleans. http://cs.uno.edu/~tamjid/Software/StackSSSPred/code_data.zip. Accessed June 2018

  56. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM (1987) Knowledge-based prediction of protein structures and the design of novel molecules. Nature 326:347–352

    Article  CAS  PubMed  Google Scholar 

  59. Wierenga RK, Terpstra P, Hol WG (1986) Prediction of the occurrence of the ADP-binding βαβ-fold in proteins, using an amino acid sequence fingerprint. J Mol Biol 187(1):101–107

    Article  CAS  PubMed  Google Scholar 

  60. Hutchinson EG, Thornton JM (1996) PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Sci 5(2):212–220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637

    Article  CAS  PubMed  Google Scholar 

  62. Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. J Mol Model 7:360–369

    Article  CAS  Google Scholar 

  63. Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinformatics 11:273

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Islam N, Iqbal S, Katebi AR, Hoque MT (2016) A balanced secondary structure predictor. J Theor Biol 389:60–71

    Article  CAS  Google Scholar 

  65. Kumar M, Gromiha MM, Raghava GP (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 8:463

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Verma R, Varshney GC, Raghava GPS (2010) Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 39(1):101–110

    Article  CAS  PubMed  Google Scholar 

  67. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  68. Paliwal KK, Sharma A, Lyons J, Dehzangi A (2014) A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobioscience 13(1):44–50

    Article  PubMed  Google Scholar 

  69. Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46

    Article  CAS  PubMed  Google Scholar 

  70. Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from NMR-determined structures and their prediction. Proteins 78:3353–3362

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. London N, Movshovitz-Attias D, Schueler-Furman O (2010) The structural basis of peptide-protein binding strategies. Structure 18(2):188–199

    Article  CAS  PubMed  Google Scholar 

  72. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  73. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185

    Google Scholar 

  74. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  Google Scholar 

  75. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378. https://doi.org/10.1016/S0167-9473(01)00065-2

    Article  Google Scholar 

  76. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, Springer series in statics, 2nd edn. Springer, New York

    Book  Google Scholar 

  77. Freedma DA (2009) Statistical models: theory and practice. Cambridge University Press, Cambridge

    Book  Google Scholar 

  78. Ho TK (1995) Random decision forests. Paper presented at the Document Analysis and Recognition, 1995. Proceedings of the Third International Conference, Montreal, Quebec, Canada

    Google Scholar 

  79. Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken, NJ

    Google Scholar 

  80. Bishop C (2009) Pattern recognition and machine learning. Information science and statistics. Springer, New York

    Google Scholar 

  81. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  82. Frank E, Hall M, Trigg L, Holmes G, Witten IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20(15):2479–2481

    Article  CAS  PubMed  Google Scholar 

  83. Ginsburga GS, McCarthyb JJ (2001) Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol 19(12):491–496

    Article  Google Scholar 

  84. Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform 2(3):159–173

    Article  Google Scholar 

  85. Hu Q, Merchante C, Stepanova AN, Alonso JM, Heber S (2015) A stacking-based approach to identify translated upstream open reading frames in Arabidopsis Thaliana. Paper presented at the International Symposium on Bioinformatics Research and Applications

    Google Scholar 

  86. Verma A, Mehta S (2017) A comparative study of ensemble learning methods for classification in bioinformatics. Paper presented at the seventh International Conference on Cloud Computing, Data Science & Engineering—Confluence, Noida, India

    Google Scholar 

  87. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67–82

    Article  Google Scholar 

  88. Frank E, Hall M, Trigg L, Holmes G, Written IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20:2479–2481

    Article  CAS  PubMed  Google Scholar 

  89. Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y (2018) B-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 39:407–411

    Article  CAS  PubMed  Google Scholar 

  90. Anne C, Mishra A, Hoque MT, Tu S (2018) Multiclass patent document classification. Artif Intell Res 7(1):1

    Article  Google Scholar 

  91. Heinig M, Frishman D (2004) STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 32(Web Server issue):W500–W502

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Martin J, Letellier G, Marin A, Taly J-F, AGD B, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgment

The authors gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund LEQSF (2016-19)-RD-B-07.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Tamjidul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Flot, M., Mishra, A., Kuchi, A.S., Hoque, M.T. (2019). StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence. In: Kister, A. (eds) Protein Supersecondary Structures. Methods in Molecular Biology, vol 1958. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9161-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9161-7_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-9160-0

  • Online ISBN: 978-1-4939-9161-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics