Exploiting Long-Range Dependencies in Protein β-Sheet Secondary Structure Prediction

  • Yizhao Ni
  • Mahesan Niranjan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6282)


We investigate if interactions of longer range than typically considered in local protein secondary structure prediction methods can be captured in a simple machine learning framework to improve the prediction of β sheets. We use support vector machines and recursive feature elimination to show that the small signals available in long range interactions can indeed be exploited. The improvement is small but statistically significant on the benchmark datasets we used. We also show that feature selection within a long window and over amino acids at specific positions typically selects amino acids that are shown to be more relevant in the initiation and termination of β-sheet formation.


Protein Secondary Structures β-Sheet Feature Selection Machine Learning 


  1. 1.
    Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387–415 (1975)CrossRefGoogle Scholar
  2. 2.
    Cole, C., Barber, J., Barton, G.: The jpred 3 secondary structure prediction server. Nucleic Acids Research, doi:10.1093/nar/gkn238Google Scholar
  3. 3.
    Cuff, J., Barton, G.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Struct. Funct. Genet. 34, 508–519 (1999)CrossRefGoogle Scholar
  4. 4.
    FarzadFard, F., Gharaei, N., Pezesnk, H., Marashi, S.: β-sheet capping: Signals that initiate and terminate β-sheet formation. Journal of Structure Biology 161(1), 101–110 (2008)CrossRefGoogle Scholar
  5. 5.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)CrossRefGoogle Scholar
  6. 6.
    Hua, S., Sun, Z.: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol. 308(2), 397–407 (2001)CrossRefPubMedGoogle Scholar
  7. 7.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)CrossRefPubMedGoogle Scholar
  8. 8.
    Minh, H.Q., Niyogi, P., Yao, Y.: Mercer’s theorem, feature map, and smoothing. In: COLT, pp. 154–168 (2006)Google Scholar
  9. 9.
    Nguyen, M., Rajapakse, J.: Multi-class support vector machines for protein secondary structure prediction. Genome Informatics 14 (2003)Google Scholar
  10. 10.
    Ni, Y., Saunders, C., Szedmak, S., Niranjan, M.: The application of structure learning in natural language processing. Machine Translation (in Press)Google Scholar
  11. 11.
    Qian, N., Sejnowski, T.: Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865–884 (1988)CrossRefPubMedGoogle Scholar
  12. 12.
    Qian, N., Sejnowski, T.: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 202(4), 865–884 (1988)CrossRefPubMedGoogle Scholar
  13. 13.
    Rost, B.: Protein secondary structure prediction continues to rise. Journal of Structural Biology 134, 204–218 (2001)CrossRefPubMedGoogle Scholar
  14. 14.
    Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)CrossRefPubMedGoogle Scholar
  15. 15.
    Ward, J., McGuffin, L., Buxton, B., Jones, D.: Secondary structure prediction with support vector machines. Bioinformatics 19(13), 1650–1655 (2003)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yizhao Ni
    • 1
  • Mahesan Niranjan
    • 1
  1. 1.ISIS Group, School of Electronics and Computer ScienceUniversity of SouthamptonU.K.

Personalised recommendations