Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy

  • Abdollah Dehzangi
  • Kuldip Paliwal
  • James Lyons
  • Alok Sharma
  • Abdul Sattar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)

Abstract

Determining the structural class of a given protein can provide important information about its functionality and its general tertiary structure. In the last two decades, the protein structural class prediction problem has attracted tremendous attention and its prediction accuracy has been significantly improved. Features extracted from the Position Specific Scoring Matrix (PSSM) have played an important role to achieve this enhancement. However, this information has not been adequately explored since the protein structural class prediction accuracy relying on PSSM for feature extraction still remains limited. In this study, to explore this potential, we propose segmentation-based feature extraction technique based on the concepts of amino acids’ distribution and auto covariance. By applying a Support Vector Machine (SVM) to our extracted features, we enhance protein structural class prediction accuracy up to 16% over similar studies found in the literature. We achieve over 90% and 80% prediction accuracies for 25PDB and 1189 benchmarks respectively by solely relying on the PSSM for feature extraction.

Keywords

Protein Structural Class Prediction Problem Feature Extraction Segmented distribution Segmented Auto Covariance Support Vector Machine (SVM) 

References

  1. 1.
    Chothia, C.: The nature of the accessible and buried surfaces in proteins. Journal of Molecular Biology 105(1), 1–12 (1976)CrossRefGoogle Scholar
  2. 2.
    Chou, K.C.: Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein and Peptide Science 6, 423–436 (2005)CrossRefGoogle Scholar
  3. 3.
    Wang, Z.X., Yuan, Z.: How good is prediction of protein structural class by the component-coupled method? Proteins: Structure, Function, and Bioinformatics 38(2), 165–175 (2000)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chou, K.C.: Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology 273(1), 236–247 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Yang, J.Y., Peng, Z.L., Chen, X.: Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics 11(suppl. 1),  S9 (2010)Google Scholar
  6. 6.
    Li, Z.C., Zhou, X.B., Lin, Y.R., Zou, X.Y.: Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 35(3), 581–590 (2008)CrossRefGoogle Scholar
  7. 7.
    Zhang, S., Ding, S., Wang, T.: High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie 93(4), 710–714 (2011)CrossRefGoogle Scholar
  8. 8.
    Liu, T., Jia, C.: A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of Theoretical Biology 267(3), 272–275 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Jahandideh, S., Abdolmaleki, P., Jahandideh, M., Asadabadi, E.B.: Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophysical Chemistry 128(1), 87–93 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Jahandideh, S., Abdolmaleki, P., Jahandideh, M., Hayatshahi, S.H.S.: Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. Journal of Theoretical Biology 244(2), 275–281 (2007)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cai, Y.D., Feng, K., Lu, W., Chou, K.: Using logitboost classifier to predict protein structural classes. Theoretical Biollogy 238, 172–176 (2006)CrossRefGoogle Scholar
  12. 12.
    Jain, P., Hirst, J.: Automatic structure classification of small proteins using random forest. BMC Bioinformatics 11(1), 364 (2010)Google Scholar
  13. 13.
    Kurgan, L.A., Chen, K.: Prediction of protein structural class for the twilight zone sequences. Biochemical and Biophysical Research Communications 357(2), 453–460 (2007)CrossRefGoogle Scholar
  14. 14.
    Kurgan, L.A., Zhang, T., Zhang, H., Shen, S., Ruan, J.: Secondary structure-based assignment of the protein structural classes. Amino Acids 35, 551–564 (2008)CrossRefGoogle Scholar
  15. 15.
    Chen, K., Kurgan, L.A., Ruan, J.: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. Journal of Computational Chemistry 29(10), 1596–1604 (2008)CrossRefGoogle Scholar
  16. 16.
    Mizianty, M., Kurgan, L.A.: Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics 10(1), 414 (2009)Google Scholar
  17. 17.
    Liu, T., Geng, X., Zheng, X., Li, R., Wang, J.: Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids 42, 2243–2249 (2012)CrossRefGoogle Scholar
  18. 18.
    Zhang, S., Ye, F., Yuan, X.: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via pssm. Journal of Biomolecular Structure and Dynamics 29(6), 1138–1146 (2012)CrossRefGoogle Scholar
  19. 19.
    Kurgan, L.A., Homaeian, L.: Prediction of structural classes for protein sequences and domains - impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognition 39, 2323–2343 (2006)MATHCrossRefGoogle Scholar
  20. 20.
    Yang, J.Y., Peng, Z.L., Yu, Z.G., Zhang, R.J., Anh, V., Wang, D.: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. Journal of Theoretical Biology 257(4), 618–626 (2009)CrossRefGoogle Scholar
  21. 21.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 17, 3389–3402 (1997)CrossRefGoogle Scholar
  22. 22.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292(2), 195–202 (1999)CrossRefGoogle Scholar
  23. 23.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)CrossRefGoogle Scholar
  24. 24.
    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)Google Scholar
  25. 25.
    Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc. (1995)Google Scholar
  26. 26.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines (2001)Google Scholar
  27. 27.
    Costantini, S., Facchiano, A.M.: Prediction of the protein structural class by specific peptide frequencies. Biochimie 91(2), 226–229 (2009)CrossRefGoogle Scholar
  28. 28.
    Kurgan, L.A., Cios, K.J., Chen, K.: Scpred: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 9, 226 (2008)CrossRefGoogle Scholar
  29. 29.
    Li, Z.C., Zhou, X.B., Dai, Z., Zou, X.Y.: Prediction of protein structural classes by chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 37, 415–425 (2009)CrossRefGoogle Scholar
  30. 30.
    Zhang, T.L., Ding, Y.S., Chou, K.C.: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Theoretical Biology 250, 186–193 (2008)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Chen, C., Zhou, X., Tian, Y., Zou, X., Cai, P.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry 357(1), 116–121 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abdollah Dehzangi
    • 1
    • 2
  • Kuldip Paliwal
    • 1
  • James Lyons
    • 1
  • Alok Sharma
    • 3
  • Abdul Sattar
    • 1
    • 2
  1. 1.Institute for Integrated and Intelligent Systems (IIIS)Griffith UniversityBrisbaneAustralia
  2. 2.National ICT Australia (NICTA)BrisbaneAustralia
  3. 3.University of the South PacificFiji

Personalised recommendations