Abstract
Predicting the secondary structure of a protein is a main topic in bioinformatics. A reliable predictor is needed by threading methods to improve the prediction of tertiary structure. Moreover, the predicted secondary structure content of a protein can be used to assign the protein to a specific folding class and thus estimate its function. We discuss here the use of support vector machines (SVMs) for the prediction of secondary structure. We show the results of a comparative experiment with a previously presented work. We measure the performances of SVMs on a significant non-redundant set of proteins. We present for the first time a direct comparison between SVMs and feed forward neural netwoks (NNs) for the task of secondary structure prediction. We exploit the use of bidirectional recurrent neural networks (BRNNs) as a filtering method to refine the predictions of the SVM classifier. Finally, we introduce a simple but effective idea to enforce constraints into secondary structure prediction based on finite-state automata (FSA) and Viterbi algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH - A Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 247, 540–563 (1995)
Qian, N., Sejnowski, T.J.: Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Journal of Molecular Biology 202, 865–884 (1988)
Rost, B., Sander, C.: Prediction of Protein Secondary Structure at Better than 70% Accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Zemla, A., Venclovas, C., Fidelis, K., Rost, B.: A Modified Definition of SOV, a Segment-Based Measure for Protein Secondary Structure Prediction Assessment. Proteins 34, 220–223 (1999)
Jones, D.T.: Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology 292, 195–202 (1999)
Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., Soda, G.: Exploiting the Past and the Future in Protein Secondary Structure Prediction. Bioinformatics 15, 937–946 (1999)
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles. Proteins 47, 228–235 (2002)
Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. Journal of Molecular Biology 308, 397–407 (2001)
Cuff, J.A., Barton, G.J.: Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins 34, 508–519 (1999)
Schneider, R., de Daruvar, A., Sander, C.: The HSSP Database of Protein Structure-Sequence Alignments. Nucleic Acids Research 25, 226–230 (1997)
Kabsch, W., Sander, C.: Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22, 2577–2637 (1983)
Hobohm, U., Sander, C.: Enlarged Representative Set of Protein Structures. Protein Science 3, 522–524 (1994)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs. Nucleic Acids Research 25, 3389–3402 (1997)
Bairoch, A., Apweiler, R.: Th e Swiss-Prot Protein Sequence Data Bank and Its New Supplement TrEMBL. Nucleic Acids Research 24, 21–25 (1996)
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
Kwok, J.T.: Moderating the Outputs of Support Vector Machine Classifiers. IEEE Transactions on Neural Networks 10, 1018–1031 (1999)
Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (1999)
Passerini, A., Pontil, M., Frasconi, P.: From Margins to Probabilities in Multiclass Learning Problems. In: van Harmelen, F. (ed.) Proc. 15th European Conf. on Artificial Intelligence (2002)
Bridle, J.: Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In: Fogelman-Soulie, F., Hérault, J. (eds.) Neuro-computing: Algorithms, Architectures, and Applications. Springer, Heidelberg (1989)
Riis, S.K., Krogh, A.: Improving Prediction of Protein Secondary Structure using Structured Neural Networks and Multiple Sequence Alignments. Journal of Computational Biology 3, 163–183 (1996)
Bengio, Y., Simard, P., Frasconi, P.: Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks 5, 157–166 (1994)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77, 257–286 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ceroni, A., Frasconi, P., Passerini, A., Vullo, A. (2003). A Combination of Support Vector Machines and Bidirectional Recurrent Neural Networks for Protein Secondary Structure Prediction. In: Cappelli, A., Turini, F. (eds) AI*IA 2003: Advances in Artificial Intelligence. AI*IA 2003. Lecture Notes in Computer Science(), vol 2829. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39853-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-39853-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20119-9
Online ISBN: 978-3-540-39853-0
eBook Packages: Springer Book Archive