Support Vector Training of Protein Alignment Models

Yu, Chun-Nam John; Joachims, Thorsten; Elber, Ron; Pillardy, Jaroslaw

doi:10.1007/978-3-540-71681-5_18

Chun-Nam John Yu¹,
Thorsten Joachims¹,
Ron Elber¹ &
…
Jaroslaw Pillardy²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1555 Accesses
8 Citations

Abstract

Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features like secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred-thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 47% of the residues correctly and aligns over 70% of the residues within a shift of 4 positions.

Keywords: Machine learning, Pairwise sequence alignment, Protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Joachims, T.: Learning to align sequences: A maximum-margin approach (August (2003), http://www.joachims.org
Joachims, T., Galor, T., Elber, R.: Learning to Align Sequences: A Maximum-Margin Approach. In: Leimkuhler, B. (ed.) New Algorithms for Macromolecular Simulation. Lecture Notes in Computational Science and Engineering, vol. 49, pp. 57–68. Springer, Heidelberg (2005)
Chapter Google Scholar
Qiu, J., Elber, R.: SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 62, 881–891 (2006)
Article Google Scholar
Bucher, P., Hofmann, K.: A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system. In: International Conference on Intelligent Systems for Molecular Biology (ISMB) (1996)
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 10915–10919 (1992)
Article Google Scholar
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)
Google Scholar
Ristad, S.E, Yianilos, P.N.: Learning String Edit Distance. IEEE Transactions on Pattern Recognition and Machine Intelligence 20(5), 522–532 (1998)
Article Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_98a.ps.gz
Chapter Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)
MathSciNet Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: International Conference on Machine Learning (ICML) (2001)
Google Scholar
Gusfield, D., Stelling, P.: Parametric and Inverse-Parametric Sequence Alignment with XPARAL. Methods in Enzymology 266, 481–494 (1996)
Article Google Scholar
Pachter, L., Sturmfelds, B.: Parametric Inference for Biological Sequence Analysis. In: Proceedings of the National Academy of Sciences, vol. 101, pp. 16138–16143 (2004)
Google Scholar
Sun, F., Fernandez-Baca, D., Yu, W.: Inverse Parametric Sequence Alignment. In: International Computing and Combinatorics Conference (COCOON) (2002)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML) (2004)
Google Scholar
Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: Discriminative Training for Protein Sequence Alignment. In: International Conference in Research on Computational Molecular Biology (RECOMB) (2006)
Google Scholar
McCallum, A., Bellare, K., Pereira, F.: A Conditional Random Field for Discriminatively-Trained Finite-State String Edit Distance. In: Conference on Uncertainty in Artificial Intelligence (2005)
Google Scholar
Kececioglu, J.D., Kim, E.: Simple and Fast Inverse Alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 441–455. Springer, Heidelberg (2006)
Chapter Google Scholar
Smith, T., Waterman, M.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Maximum-Margin Markov Networks. In: Neural Information Processing Systems (NIPS) (2003)
Google Scholar
Shindyalov, I.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension(CE) of the optimal path. Protein Eng. 11, 739–747 (1998)
Article Google Scholar
Zhang, Y., Skolnick, J.: TM-align: A protein structure alignment algorithm based on TM-score. Nucleic Acids Research 33, 2302–2309 (2005)
Article Google Scholar
Adamczak, R., Porollo, A., Meller, J.: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 56, 753–767 (2004)
Article Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bond and geometrical features. Biopolymers 22, 2577–2637 (1983)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Cornell University, Ithaca NY 14853, USA
Chun-Nam John Yu, Thorsten Joachims & Ron Elber
Cornell Theory Center, Cornell University, Ithaca NY 14853, USA
Jaroslaw Pillardy

Authors

Chun-Nam John Yu
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Joachims
View author publications
You can also search for this author in PubMed Google Scholar
Ron Elber
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslaw Pillardy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, CN.J., Joachims, T., Elber, R., Pillardy, J. (2007). Support Vector Training of Protein Alignment Models. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-71681-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics