Skip to main content

Support Vector Training of Protein Alignment Models

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features like secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred-thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 47% of the residues correctly and aligns over 70% of the residues within a shift of 4 positions.

Keywords: Machine learning, Pairwise sequence alignment, Protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Joachims, T.: Learning to align sequences: A maximum-margin approach (August (2003), http://www.joachims.org

  2. Joachims, T., Galor, T., Elber, R.: Learning to Align Sequences: A Maximum-Margin Approach. In: Leimkuhler, B. (ed.) New Algorithms for Macromolecular Simulation. Lecture Notes in Computational Science and Engineering, vol. 49, pp. 57–68. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Qiu, J., Elber, R.: SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 62, 881–891 (2006)

    Article  Google Scholar 

  4. Bucher, P., Hofmann, K.: A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system. In: International Conference on Intelligent Systems for Molecular Biology (ISMB) (1996)

    Google Scholar 

  5. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 10915–10919 (1992)

    Article  Google Scholar 

  7. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)

    Google Scholar 

  8. Ristad, S.E, Yianilos, P.N.: Learning String Edit Distance. IEEE Transactions on Pattern Recognition and Machine Intelligence 20(5), 522–532 (1998)

    Article  Google Scholar 

  9. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_98a.ps.gz

    Chapter  Google Scholar 

  10. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)

    MathSciNet  Google Scholar 

  11. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: International Conference on Machine Learning (ICML) (2001)

    Google Scholar 

  12. Gusfield, D., Stelling, P.: Parametric and Inverse-Parametric Sequence Alignment with XPARAL. Methods in Enzymology 266, 481–494 (1996)

    Article  Google Scholar 

  13. Pachter, L., Sturmfelds, B.: Parametric Inference for Biological Sequence Analysis. In: Proceedings of the National Academy of Sciences, vol. 101, pp. 16138–16143 (2004)

    Google Scholar 

  14. Sun, F., Fernandez-Baca, D., Yu, W.: Inverse Parametric Sequence Alignment. In: International Computing and Combinatorics Conference (COCOON) (2002)

    Google Scholar 

  15. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML) (2004)

    Google Scholar 

  16. Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: Discriminative Training for Protein Sequence Alignment. In: International Conference in Research on Computational Molecular Biology (RECOMB) (2006)

    Google Scholar 

  17. McCallum, A., Bellare, K., Pereira, F.: A Conditional Random Field for Discriminatively-Trained Finite-State String Edit Distance. In: Conference on Uncertainty in Artificial Intelligence (2005)

    Google Scholar 

  18. Kececioglu, J.D., Kim, E.: Simple and Fast Inverse Alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 441–455. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Smith, T., Waterman, M.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  20. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  21. Taskar, B., Guestrin, C., Koller, D.: Maximum-Margin Markov Networks. In: Neural Information Processing Systems (NIPS) (2003)

    Google Scholar 

  22. Shindyalov, I.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension(CE) of the optimal path. Protein Eng. 11, 739–747 (1998)

    Article  Google Scholar 

  23. Zhang, Y., Skolnick, J.: TM-align: A protein structure alignment algorithm based on TM-score. Nucleic Acids Research 33, 2302–2309 (2005)

    Article  Google Scholar 

  24. Adamczak, R., Porollo, A., Meller, J.: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 56, 753–767 (2004)

    Article  Google Scholar 

  25. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bond and geometrical features. Biopolymers 22, 2577–2637 (1983)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Yu, CN.J., Joachims, T., Elber, R., Pillardy, J. (2007). Support Vector Training of Protein Alignment Models. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics