Exploiting Long-Range Dependencies in Protein β-Sheet Secondary Structure Prediction
We investigate if interactions of longer range than typically considered in local protein secondary structure prediction methods can be captured in a simple machine learning framework to improve the prediction of β sheets. We use support vector machines and recursive feature elimination to show that the small signals available in long range interactions can indeed be exploited. The improvement is small but statistically significant on the benchmark datasets we used. We also show that feature selection within a long window and over amino acids at specific positions typically selects amino acids that are shown to be more relevant in the initiation and termination of β-sheet formation.
KeywordsProtein Secondary Structures β-Sheet Feature Selection Machine Learning
- 2.Cole, C., Barber, J., Barton, G.: The jpred 3 secondary structure prediction server. Nucleic Acids Research, doi:10.1093/nar/gkn238Google Scholar
- 8.Minh, H.Q., Niyogi, P., Yao, Y.: Mercer’s theorem, feature map, and smoothing. In: COLT, pp. 154–168 (2006)Google Scholar
- 9.Nguyen, M., Rajapakse, J.: Multi-class support vector machines for protein secondary structure prediction. Genome Informatics 14 (2003)Google Scholar
- 10.Ni, Y., Saunders, C., Szedmak, S., Niranjan, M.: The application of structure learning in natural language processing. Machine Translation (in Press)Google Scholar