Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features

  • Abdollah Dehzangi
  • Kuldip Paliwal
  • James Lyons
  • Alok Sharma
  • Abdul Sattar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)


Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. It also provides crucial information about the functionality of the proteins. Despite all the efforts that have been made during the past two decades, finding an accurate and fast computational approach to solve PFR still remains a challenging problem for bioinformatics and computational biology. It has been shown that extracting features which contain significant local and global discriminatory information plays a key role in addressing this problem. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in Position Specific Scoring Matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a Support Vector Machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy to 7.4% over the best results reported in the literature.


Protein Fold Recognition Feature Extraction Segmented distribution Segmented Auto Covariance Occurrence Support Vector Machine (SVM) 


  1. 1.
    Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)CrossRefGoogle Scholar
  2. 2.
    Chen, K., Kurgan, L.A.: Pfres: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23(21), 2843–2850 (2007)CrossRefGoogle Scholar
  3. 3.
    Shen, H.B., Chou, K.C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22, 1717–1722 (2006)CrossRefGoogle Scholar
  4. 4.
    Damoulas, T., Girolami, M.: Probabilistic multi-class multi-kernel learning: On protein fold recognition and remote homology detection. Bioinformatics 24(10), 1264–1270 (2008)CrossRefGoogle Scholar
  5. 5.
    Deschavanne, P., Tuffery, P.: Enhanced protein fold recognition using a structural alphabet. Proteins: Structure, Function, and Bioinformatics 76(1), 129–137 (2009)CrossRefGoogle Scholar
  6. 6.
    Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Using random forest for protein fold prediction problem: An empirical study. Journal of Information Science and Engineering 26(6), 1941–1956 (2010)Google Scholar
  7. 7.
    Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Enhancing protein fold prediction accuracy by using ensemble of different classifiers. Australian Journal of Intelligent Information Processing Systems 26(4), 32–40 (2010)Google Scholar
  8. 8.
    Kavousi, K., Sadeghi, M., Moshiri, B., Araabi, B.N., Moosavi-Movahedi, A.A.: Evidence theoretic protein fold classification based on the concept of hyperfold. Mathematical Biosciences 240(2), 148–160 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Yang, T., Kecman, V., Cao, L., Zhang, C., Huang, J.Z.: Margin-based ensemble classifier for protein fold recognition. Expert Systems with Applications 38, 12348–12355 (2011)CrossRefGoogle Scholar
  10. 10.
    Dong, Q., Zhou, S., Guan, G.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655–2662 (2009)CrossRefGoogle Scholar
  11. 11.
    Shamim, M.T.A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinformatics 23(24), 3320–3327 (2007)CrossRefGoogle Scholar
  12. 12.
    Chen, K., Stach, W., Homaeian, L., Kurgan, L.: ifc2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 40, 963–973 (2011)CrossRefGoogle Scholar
  13. 13.
    Dehzangi, A., Phon-Amnuaisuk, S.: Fold prediction problem: The application of new physical and physicochemical- based features. Protein and Peptide Letters 18(2), 174–185 (2011)CrossRefGoogle Scholar
  14. 14.
    Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. Journal of Theoretical Biology 320(0), 41–46 (2013)CrossRefGoogle Scholar
  15. 15.
    Taguchi, Y.H., Gromiha, M.M.: Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinformatics 8(1) (2007)Google Scholar
  16. 16.
    Ghanty, P., Pal, N.R.: Prediction of protein folds: Extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers. IEEE Transactions on NanoBioscience 8(1), 100–110 (2009)CrossRefGoogle Scholar
  17. 17.
    Gromiha, M.M.: Multiple contact network is a key determinant to protein folding rates. Journal of Chemical Information and Modeling 49(4), 1130–1135 (2009)CrossRefGoogle Scholar
  18. 18.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 17, 3389–3402 (1997)CrossRefGoogle Scholar
  19. 19.
    Faraggi, E., Zhang, T., Yang, Y., Kurgan, L., Zhou, Y.: Spine x: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. Journal of Computational Chemistry 33(3), 259–267 (2012)CrossRefGoogle Scholar
  20. 20.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292(2), 195–202 (1999)CrossRefGoogle Scholar
  21. 21.
    Shen, H.B., Chou, K.C.: Predicting protein fold pattern with functional domain and sequential evolution information. Journal of Theoretical Biology 256(3), 441–446 (2009)CrossRefGoogle Scholar
  22. 22.
    Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc. (1995)Google Scholar
  23. 23.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abdollah Dehzangi
    • 1
    • 2
  • Kuldip Paliwal
    • 1
  • James Lyons
    • 1
  • Alok Sharma
    • 3
  • Abdul Sattar
    • 1
    • 2
  1. 1.Institute for Integrated and Intelligent Systems (IIIS)Griffith UniversityBrisbaneAustralia
  2. 2.National ICT Australia (NICTA)BrisbaneAustralia
  3. 3.University of the South PacificFiji

Personalised recommendations