StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence

Flot, Michael; Mishra, Avdesh; Kuchi, Aditi Sharma; Hoque, Md Tamjidul

doi:10.1007/978-1-4939-9161-7_5

Michael Flot³^na1,
Avdesh Mishra³^na1,
Aditi Sharma Kuchi³^na1 &
…
Md Tamjidul Hoque³^na1

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1958))

1315 Accesses
7 Citations

Abstract

Supersecondary structure (SSS) refers to specific geometric arrangements of several secondary structure (SS) elements that are connected by loops. The SSS can provide useful information about the spatial structure and function of a protein. As such, the SSS is a bridge between the secondary structure and tertiary structure. In this chapter, we propose a stacking-based machine learning method for the prediction of two types of SSSs, namely, β-hairpins and β-α-β, from the protein sequence based on comprehensive feature encoding. To encode protein residues, we utilize key features such as solvent accessibility, conservation profile, half surface exposure, torsion angle fluctuation, disorder probabilities, and more. The usefulness of the proposed approach is assessed using a widely used threefold cross-validation technique. The obtained empirical result shows that the proposed approach is useful and prediction can be improved further.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen K, Kurgan L (2012) Computational prediction of secondary and supersecondary structures. In: Kister A (ed) Protein supersecondary structures, vol 932. Humana Press, Totowa, NJ
Chapter Google Scholar
Sun L, Hu X, Li S, Jiang Z, Li K (2016) Prediction of complex super-secondary structure βαβ motifs based on combined features. Saudi J Biol Sci 23(1):66–71
Article CAS PubMed Google Scholar
Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294(5540):93–96
Article CAS PubMed Google Scholar
Skolnick J, Fetrow JS, Kolinski A (2000) Structural genomics and its importance for gene function analysis. Nat Biotechnol 18:283–287
Article CAS PubMed Google Scholar
Bhattacharya D, Cao R, Cheng J (2016) UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 32(18):2791–2799
Article CAS PubMed PubMed Central Google Scholar
Bhattacharya D, Cheng J (2013) i3Drefine software for protein 3D structure refinement and its assessment in CASP10. PLoS One 8(7):e69648
Article CAS PubMed PubMed Central Google Scholar
Bradley P, Misura KMS, Baker D (2005) Toward high-resolution de novo structure prediction for small proteins. Science 309(5742):1868–1871
Article CAS PubMed Google Scholar
Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J (2015) Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics 31(12):i116–i123
Article CAS PubMed PubMed Central Google Scholar
Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins 69(S8):57–67
Article CAS PubMed Google Scholar
Klepeis JL, Wei Y, Hecht MH, Floudas CA (2005) Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins 58(3):560–570
Article CAS PubMed Google Scholar
Liwo A, Khalili M, Scheraga HA (2005) Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci U S A 102(7):2362–2367
Article CAS PubMed PubMed Central Google Scholar
Wu S, Skolnick J, Zhang Y (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 5:17
Article PubMed PubMed Central CAS Google Scholar
He X, Zhu Y, Epstein A, Mo Y (2018) Statistical variances of diffusional properties from ab initio molecular dynamics simulations. npj Comput Mater 4(1):18. https://doi.org/10.1038/s41524-018-0074-y
Article CAS Google Scholar
Magnan CN, Baldi P (2015) Molecular dynamics simulations advances and applications. Adv Appl Bioinforma Chem 8:37–47
Google Scholar
Ginalski K, Pas J, Wyrwicz LS, Mv G, Bujnicki JM, Rychlewskia L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807
Article CAS PubMed PubMed Central Google Scholar
Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287(4):797–815
Article CAS PubMed Google Scholar
Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856
Article CAS PubMed Google Scholar
Skolnick J, Kihara D, Zhang Y (2004) Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm. Proteins 56:502–518
Article CAS PubMed Google Scholar
Wu S, Zhang Y (2008) MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72(2):547–556
Article CAS PubMed PubMed Central Google Scholar
Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15):2076–2082
Article CAS PubMed PubMed Central Google Scholar
Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y (2015) The I-TASSER Suite: protein structure and function prediction. Nat Methods 12:7–8
Article CAS PubMed PubMed Central Google Scholar
Faraggi E, Yang Y, Zhang S, Zhou Y (2010) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17(11):1515–1527
Article CAS Google Scholar
Szilágyi A, Skolnick J (2006) Efficient prediction of nucleic acid binding function from low-resolution protein Structures. J Mol Biol 358(3):922–933
Article PubMed CAS Google Scholar
Zhou H, Skolnick J (2007) Ab initio protein structure prediction using chunk-TASSER. Biophys J 93(5):1510–1518
Article CAS PubMed PubMed Central Google Scholar
Magnan CN, Baldi P (2014) SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30(18):2592–2597
Article CAS PubMed PubMed Central Google Scholar
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers, and solvent accessibility. Bioinformatics 33(18):2842–2849
Article CAS PubMed Google Scholar
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Article PubMed PubMed Central Google Scholar
Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267
Article CAS PubMed Google Scholar
Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862
CAS PubMed Google Scholar
Chowdhury SY, Shatabda S, Dehzangi A (2017) iDNAProt-ES: identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 7:14938
Article PubMed PubMed Central CAS Google Scholar
Iqbal S, Hoque MT (2018) PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence. Bioinformatics 34(19):3289–3299
Article PubMed Google Scholar
Taherzadeh G, Zhou Y, Liew AW-C, Yang Y (2016) Sequence-based prediction of protein-carbohydrate binding sites using support vector machines. J Chem Inf Model 56(10):2115–2122
Article CAS PubMed Google Scholar
Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28(23):3066–3072
Article CAS PubMed PubMed Central Google Scholar
Iqbal S, Hoque MT (2015) DisPredict: a predictor of disordered protein using optimized RBF kernel. PLoS One 10(10):e0141551
Article PubMed PubMed Central CAS Google Scholar
Iqbal S, Hoque MT (2016) Estimation of position specific energy as a feature of protein residues from sequence alone for structural classification. PLoS One 11(9):e0161452
Article PubMed PubMed Central CAS Google Scholar
Iqbal S, Mishra A, Hoque T (2015) Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 380:380–391
Article PubMed Google Scholar
Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33
Article CAS PubMed PubMed Central Google Scholar
Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405
Article CAS PubMed Google Scholar
Jia S-C, Hu X-Z (2011) Using random forest algorithm to predict β-hairpin motifs. Protein Pept Lett 18(6):609–617
Article CAS PubMed Google Scholar
Hu X-Z, Li Q-Z, Wang C-L (2010) Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 38(3):915–921
Article CAS PubMed Google Scholar
Sun L, Hu X (2013) Recognition of beta-alpha-beta motifs in proteins by using Random Forest algorithm. Paper presented at the sixth International Conference on Biomedical Engineering and Informatics, Hangzhou, China
Google Scholar
Mahrenholz CC, Abfalter IG, Bodenhofer U, Volkmer R, Hochreiter S (2011) Complex networks govern coiled-coil oligomerization—predicting and profiling by means of a machine learning approach. Mol Cell Proteomics 10(5):M110.004994
Article PubMed PubMed Central CAS Google Scholar
Bartoli L, Fariselli P, Krogh A, Casadio R (2009) CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information. Bioinformatics 25(21):2757–2763
Article CAS PubMed Google Scholar
Pellegrini-Calace M, Thornton JM (2005) Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information. Nucleic Acids Res 33(7):2129–2140
Article CAS PubMed PubMed Central Google Scholar
Dodd IB, Egan JB (1990) Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res 18(17):5019–5026
Article CAS PubMed PubMed Central Google Scholar
Ferrer-Costa C, Shanahan HP, Jones S, Thornton JM (2005) HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics 21(18):3679–3680
Article CAS PubMed Google Scholar
Kumar M, Bhasin M, Natt NK, Raghava GPS (2005) BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 33(Web Server issue):W154–W159
Article CAS PubMed PubMed Central Google Scholar
Sun ZR, Cui Y, Ling LJ, Guo Q, Chen RS (1998) Molecular dynamics simulation of protein folding with supersecondary structure constraints. J Protein Chem 17(8):765–769
Article CAS PubMed Google Scholar
Szappanos B, Süveges D, Nyitray L, Perczel A, Gáspári Z (2010) Folded-unfolded cross-predictions and protein evolution: the case study of coiled-coils. FEBS Lett 584(8):1623–1627
Article CAS PubMed Google Scholar
O’Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B (2011) A method for probing the mutational landscape of amyloid structure. Bioinformatics 27(13):i34–i42
Article PubMed PubMed Central CAS Google Scholar
Rackham OJL, Madera M, Armstrong CT, Vincent TL, Woolfson DN, Gough J (2010) The evolution and structure prediction of coiled coils across all genomes. J Mol Biol 403(3):480–493
Article CAS PubMed Google Scholar
Gerstein M, Hegyi H (1998) Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 22(4):277–304
Article CAS PubMed Google Scholar
Reddy CC, Shameer K, Offmann BO, Sowdhamini R (2008) PURE: a webserver for the prediction of domains in unassigned regions in proteins. BMC Bioinformatics 9:281
Article PubMed PubMed Central CAS Google Scholar
Mishra A, Pokhrel P, Hoque MT (2018) StackDPPred: a stacking based prediction of DNA-binding protein from sequence. http://cs.uno.edu/~tamjid/TechReport/StackDPPred_TR2018_2.pdf
Flot M, Mishra A, Kuchi AS, Hoque MT (2018) Benchmark data for supersecondary structure prediction only from sequence. University of New Orleans. http://cs.uno.edu/~tamjid/Software/StackSSSPred/code_data.zip. Accessed June 2018
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Article CAS PubMed PubMed Central Google Scholar
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
Article PubMed PubMed Central CAS Google Scholar
Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM (1987) Knowledge-based prediction of protein structures and the design of novel molecules. Nature 326:347–352
Article CAS PubMed Google Scholar
Wierenga RK, Terpstra P, Hol WG (1986) Prediction of the occurrence of the ADP-binding βαβ-fold in proteins, using an amino acid sequence fingerprint. J Mol Biol 187(1):101–107
Article CAS PubMed Google Scholar
Hutchinson EG, Thornton JM (1996) PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Sci 5(2):212–220
Article CAS PubMed PubMed Central Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Article CAS PubMed Google Scholar
Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. J Mol Model 7:360–369
Article CAS Google Scholar
Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinformatics 11:273
Article PubMed PubMed Central CAS Google Scholar
Islam N, Iqbal S, Katebi AR, Hoque MT (2016) A balanced secondary structure predictor. J Theor Biol 389:60–71
Article CAS Google Scholar
Kumar M, Gromiha MM, Raghava GP (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 8:463
Article CAS PubMed PubMed Central Google Scholar
Verma R, Varshney GC, Raghava GPS (2010) Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 39(1):101–110
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Paliwal KK, Sharma A, Lyons J, Dehzangi A (2014) A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobioscience 13(1):44–50
Article PubMed Google Scholar
Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46
Article CAS PubMed Google Scholar
Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from NMR-determined structures and their prediction. Proteins 78:3353–3362
Article CAS PubMed PubMed Central Google Scholar
London N, Movshovitz-Attias D, Schueler-Furman O (2010) The structural basis of peptide-protein binding strategies. Structure 18(2):188–199
Article CAS PubMed Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185
Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, Springer series in statics, 2nd edn. Springer, New York
Book Google Scholar
Freedma DA (2009) Statistical models: theory and practice. Cambridge University Press, Cambridge
Book Google Scholar
Ho TK (1995) Random decision forests. Paper presented at the Document Analysis and Recognition, 1995. Proceedings of the Third International Conference, Montreal, Quebec, Canada
Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken, NJ
Google Scholar
Bishop C (2009) Pattern recognition and machine learning. Information science and statistics. Springer, New York
Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
Frank E, Hall M, Trigg L, Holmes G, Witten IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20(15):2479–2481
Article CAS PubMed Google Scholar
Ginsburga GS, McCarthyb JJ (2001) Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol 19(12):491–496
Article Google Scholar
Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform 2(3):159–173
Article Google Scholar
Hu Q, Merchante C, Stepanova AN, Alonso JM, Heber S (2015) A stacking-based approach to identify translated upstream open reading frames in Arabidopsis Thaliana. Paper presented at the International Symposium on Bioinformatics Research and Applications
Google Scholar
Verma A, Mehta S (2017) A comparative study of ensemble learning methods for classification in bioinformatics. Paper presented at the seventh International Conference on Cloud Computing, Data Science & Engineering—Confluence, Noida, India
Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67–82
Article Google Scholar
Frank E, Hall M, Trigg L, Holmes G, Written IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20:2479–2481
Article CAS PubMed Google Scholar
Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y (2018) B-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 39:407–411
Article CAS PubMed Google Scholar
Anne C, Mishra A, Hoque MT, Tu S (2018) Multiclass patent document classification. Artif Intell Res 7(1):1
Article Google Scholar
Heinig M, Frishman D (2004) STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 32(Web Server issue):W500–W502
Article CAS PubMed PubMed Central Google Scholar
Martin J, Letellier G, Marin A, Taly J-F, AGD B, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgment

The authors gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund LEQSF (2016-19)-RD-B-07.

Author information

Michael Flot and Avdesh Mishra contributed equally to this work.

Authors and Affiliations

Department of Computer Science, University of New Orleans, New Orleans, LA, USA
Michael Flot, Avdesh Mishra, Aditi Sharma Kuchi & Md Tamjidul Hoque

Authors

Michael Flot
View author publications
You can also search for this author in PubMed Google Scholar
Avdesh Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Aditi Sharma Kuchi
View author publications
You can also search for this author in PubMed Google Scholar
Md Tamjidul Hoque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Tamjidul Hoque .

Editor information

Editors and Affiliations

Department of Mathematics, Rutgers University, Piscataway, NJ, USA
Alexander E. Kister

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Flot, M., Mishra, A., Kuchi, A.S., Hoque, M.T. (2019). StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence. In: Kister, A. (eds) Protein Supersecondary Structures. Methods in Molecular Biology, vol 1958. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9161-7_5

Download citation

DOI: https://doi.org/10.1007/978-1-4939-9161-7_5
Published: 04 April 2019
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9160-0
Online ISBN: 978-1-4939-9161-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics