A novel improved prediction of protein structural class using deep recurrent neural network

  • Bishnupriya PandaEmail author
  • Babita MajhiEmail author
Special Issue


For last few decades, sequence arrangement of amino acids have been utilized for the prediction of protein secondary structure. Recent methods have applied high dimensional natural language based features in machine learning models. Performance measures of machine learning based models are significantly affected by data size and data dimensionality. It is a huge challenge to develop a generic model which can be trained to perform both for small and large sized datasets in a low dimensional framework. In the present research, we suggest a low dimensional representation for both small and large sized datasets. A hybrid space of Atchley’s factors II, IV, V, electron ion interaction potential and SkipGram based word2vec have been employed for amino acid sequence representation. Subsequently Stockwell transformation is applied to the representation to preserve features both in time and frequency domains. Finally, deep gated recurrent network with dropout, categorical-cross entropy error estimation and Adam optimization is used for classification purpose. The introduced method results in better prediction accuracies for both small (204,277, and 498) and large sized (PDB25, Protein 640 and FC699) bench mark data sets of low sequence similarity (25–40%). The obtained classification accuracies for PDB25, 640, FC699, 498, 277, 204 datasets are 84.2%, 94.31%, 93.1%, 95.9%, 94.5% and 85.36% respectively. The major contributions in this research is that, for the first time, we verify the protein secondary structural class prediction in a very low dimensional (18-D) feature space with a novel feature representation method. Secondly, we also verify for the first time, the behaviour of deep networks for low dimensional small sized data sets.


Protein secondary structure prediction SkipGram model T-SNE Atchley’s factors DeepGRU Bioinformatics 


Author contributions

Both the authors have equal contribution. Both the authors read and approved the final manuscript.

Compliance with ethical standards

Conflict of interest

We declare that we have no competing interests as well as conflict of interests.

Supplementary material

12065_2018_171_MOESM1_ESM.docx (32 kb)
Supplementary material 1 (DOCX 31 KB)


  1. 1.
    Breda A, Valadares NF, de Souza ON, Garratt RC (2007) Protein structure, modelling and applicationsGoogle Scholar
  2. 2.
    Guo JT, Ellrott K, Xu Y (2008) A historical perspective of template-based protein structure prediction. In: Protein structure prediction. Humana Press, pp 3–42Google Scholar
  3. 3.
    Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316CrossRefGoogle Scholar
  4. 4.
    Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230CrossRefGoogle Scholar
  5. 5.
    Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552CrossRefGoogle Scholar
  6. 6.
    Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99(1):153–162CrossRefGoogle Scholar
  7. 7.
    Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Bioinf 21(4):319–344CrossRefGoogle Scholar
  8. 8.
    Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices1. J Mol Biol 292(2):195–202CrossRefGoogle Scholar
  9. 9.
    Wang ZX (2001) The prediction accuracy for protein structural class by the component-coupled method is around 60%. Proteins Struct Funct Genet 43(3):339–340CrossRefGoogle Scholar
  10. 10.
    Luo RY, Feng ZP, Liu JK (2002) Prediction of protein structural class by amino acid and polypeptide composition. FEBS J 269(17):4219–4225Google Scholar
  11. 11.
    Kurgan LA, Homaeian L (2006) Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recogn 39(12):2323–2343CrossRefGoogle Scholar
  12. 12.
    Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34(5–6):320–327CrossRefGoogle Scholar
  13. 13.
    Yang JY, Peng ZL, Chen X (2010) Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinf 11(1):S9CrossRefGoogle Scholar
  14. 14.
    Garza-Fabre M, Rodriguez-Tello E, Toscano-Pulido G (2015) Constraint-handling through multi-objective optimization: The hydrophobic-polar model for protein structure prediction. Comput Oper Res 53:128–153MathSciNetCrossRefGoogle Scholar
  15. 15.
    Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11(7):523–538CrossRefGoogle Scholar
  16. 16.
    Bu WS, Feng ZP, Zhang Z, Zhang CT (1999) Prediction of protein (domain) structural classes based on amino-acid index. FEBS J 266(3):1043–1049Google Scholar
  17. 17.
    Chou KC (2004) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19CrossRefGoogle Scholar
  18. 18.
    Ding S, Zhang S, Li Y, Wang T (2012) A novel protein structural class prediction method based on predicted secondary structure. Biochimie 94(5):1166–1171CrossRefGoogle Scholar
  19. 19.
    Bursia A, Jaitly N (2017) Next-step conditioned deep convolutional neural networks improve protein secondary structure prediction. arXiv preprint. arXiv:1702.03865Google Scholar
  20. 20.
    Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv preprint. arXiv:1701.08318Google Scholar
  21. 21.
    Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6:18962CrossRefGoogle Scholar
  22. 22.
    Wang Y, Mao H, Yi Z (2017) Protein secondary structure prediction by using deep learning method. Knowl Based Syst 118:115–123CrossRefGoogle Scholar
  23. 23.
    Lee TK, Nguyen T (2016) Protein family classification with neural networksGoogle Scholar
  24. 24.
    Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287CrossRefGoogle Scholar
  25. 25.
    Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605zbMATHGoogle Scholar
  26. 26.
    Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400CrossRefGoogle Scholar
  27. 27.
    Chen KE, Kurgan LA, Ruan J (2008) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29(10):1596–1604CrossRefGoogle Scholar
  28. 28.
    Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17(8):729–738CrossRefGoogle Scholar
  29. 29.
    Chou KC (1999) A key driving force in determination of protein structural classes. Biochem Biophys Res Commun 264(1):216–224CrossRefGoogle Scholar
  30. 30.
    Stockwell RG, Mansinha L, Lowe RP (1996) Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 44(4):998–1001CrossRefGoogle Scholar
  31. 31.
    Sejdić E, Djurović I, Jiang J (2009) Time–frequency feature representation using energy concentration: an overview of recent advances. Digit Signal Proc 19(1):153–183CrossRefGoogle Scholar
  32. 32.
    Veljkovic V, Cosic I, Lalovic D (1985) Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 5:337–341CrossRefGoogle Scholar
  33. 33.
    Bhende CN, Mishra S, Panigrahi BK (2008) Detection and classification of power quality disturbances using S-transform and modular neural network. Electr Power Syst Res 78(1):122–128CrossRefGoogle Scholar
  34. 34.
    Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. In: Advances in neural information processing systems, pp 190–198Google Scholar
  35. 35.
    Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint. arXiv:1412.3555Google Scholar
  36. 36.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980Google Scholar
  37. 37.
    Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(Aug):pp115–143MathSciNetzbMATHGoogle Scholar
  38. 38.
    Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRefGoogle Scholar
  39. 39.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Kudlur M (2016). TensorFlow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283Google Scholar
  40. 40.
    Chollet F (2017) Deep learning with python. Manning Publications Co., New YorkGoogle Scholar
  41. 41.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering, Institute of Technical Education and ResearchSiksha ‘O’ Anusandhan UniversityBhubaneswarIndia
  2. 2.Department of Computer Science and Information TechnologyGuru Ghashidas Vishwavidyalaya (A Central University)BilaspurIndia

Personalised recommendations