Skip to main content

Boosting Protein Threading Accuracy

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Abstract

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.

This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kihara, D., Skolnick, J.: The PDB is a covering set of small protein structures. Journal of Molecular Biology 334(4), 793–802 (2003)

    Article  CAS  PubMed  Google Scholar 

  2. Zhang, Y., Skolnick, J.: The protein structure prediction problem could be solved using the current PDB library. Proceedings of National Academy Sciences, USA 102(4), 1029–1034 (2005)

    Article  CAS  Google Scholar 

  3. Jones, D.T.: Progress in protein structure prediction. Current Opinion in Structural Biology 7(3), 377–387 (1997)

    Article  CAS  PubMed  Google Scholar 

  4. Rost, B.: Twilight zone of protein sequence alignments. Protein Engineering 12, 85–94 (1999)

    Article  CAS  PubMed  Google Scholar 

  5. John, B., Sali, A.: Comparative protein structure modeling by iterative alignment model building and model assessment. Nucleic Acids Research 31(14), 3982–3992 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Chivian, Dylan, Baker, David: Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Research 34(17), e112 (2006)

    Article  Google Scholar 

  7. Marko, A.C., Stafford, K., Wymore, T.: Stochastic Pairwise Alignments and Scoring Methods for Comparative Protein Structure Modeling. Journal of Chemical Information and Modeling (March 2007)

    Google Scholar 

  8. Jaroszewski, L., Rychlewski, L., Li, Z., Li, W., Godzik, A.: FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research 33(Web Server issue) (July 2005)

    Google Scholar 

  9. Rychlewski, L., Jaroszewski, L., Li, W., Godzik, A.: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 9(2), 232–241 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. Journal of Molecular Biology (315), 1257–1275 (2002)

    Article  CAS  PubMed  Google Scholar 

  11. Pei, J., Sadreyev, R., Grishin, N.V.: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19(3), 427–428 (2003)

    Article  CAS  PubMed  Google Scholar 

  12. Marti-Renom, M.A., Madhusudhan, M.S., Sali, A.: Alignment of protein sequences by their profiles. Protein Science 13(4), 1071–1087 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Ginalski, K., Pas, J., Wyrwicz, L.S., von Grotthuss, M., Bujnicki, J.M., Rychlewski, L.: ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Research 31(13), 3804–3807 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhou, H., Zhou, Y.: Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins: Structure, Function, and Bioinformatics 55(4), 1005–1013 (2004)

    Article  CAS  Google Scholar 

  15. Han, S., Lee, B.-C., Yu, S.T., Jeong, C.-S., Lee, S., Kim, D.: Fold recognition by combining profile-profile alignment and support vector machine. Bioinformatics 21(11), 2667–2673 (2005)

    Article  CAS  PubMed  Google Scholar 

  16. Shi, J., Blundell, T.L., Mizuguchi, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310(1), 243–257 (2001)

    Article  CAS  PubMed  Google Scholar 

  17. Jones, D.T.: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. Journal of Molecular Biology 287(4), 797–815 (1999)

    Article  CAS  PubMed  Google Scholar 

  18. Kelley, L.A., MacCallum, R.M., Sternberg, M.J.: Enhanced genome annotation using structural profiles in the program 3D-PSSM. Journal of Molecular Biology 299(2), 499–520 (2000)

    Article  CAS  PubMed  Google Scholar 

  19. Zhou, H., Zhou, Y.: Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins: Structure, Function, and Bioinformatics 58(2), 321–328 (2005)

    Article  CAS  Google Scholar 

  20. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14(10), 846–856 (1998)

    Article  CAS  PubMed  Google Scholar 

  21. Johannes, S.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7), 951–960 (2005)

    Article  Google Scholar 

  22. Xu, J., Li, M., Lin, G., Kim, D., Xu, Y.: Protein threading by linear programming. In: The Pacific Symposium on Biocomputing, pp. 264–275 (2003)

    Google Scholar 

  23. Xu, J., Li, M., Kim, D., Xu, Y.: RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1(1), 95–117 (2003)

    Article  CAS  PubMed  Google Scholar 

  24. Xu, J., Li, M.: Assessment of RAPTOR’s linear programming approach in CAFASP3. Proteins: Structure, Function and Genetics (2003)

    Google Scholar 

  25. Xu, J., Jiao, F., Berger, B.: A tree-decomposition approach to protein structure prediction. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp. 247–256 (2005)

    Google Scholar 

  26. Rai, B.K., Fiser, A.: Multiple mapping method: a novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling. Proteins: Structure, Function, and Bioinformatics 63(3), 644–661 (2006)

    Article  CAS  Google Scholar 

  27. Wu, S., Zhang, Y.: MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins: Structure, Function, and Bioinformatics 9999(9999), NA+ (2008)

    Google Scholar 

  28. Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 5, 17+ (2007)

    Article  PubMed  PubMed Central  Google Scholar 

  29. Silva, P.J.: Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis. Proteins: Structure, Function, and Bioinformatics 70(4), 1588–1594 (2008)

    Article  CAS  Google Scholar 

  30. Skolnick, J., Kihara, D.: Defrosting the frozen approximation: PROSPECTOR - a new approach to threading. Proteins: Structure, Function, and Genetics 42(3), 319–331 (2001)

    Article  CAS  Google Scholar 

  31. Kim, D., Xu, D., Guo, J., Ellrott, K., Xu, Y.: PROSPECT II: Protein structure prediction method for genome-scale applications. Protein Engineering (2002)

    Google Scholar 

  32. Yu, C.N., Joachims, T., Elber, R., Pillardy, J.: Support vector training of protein alignment models. Journal of Computational Biology 15(7), 867–880 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Dietterich, T.G., Ashenfelter, A., Bulatov, Y.: Training Conditional Random Fields via Gradient Tree Boosting. In: Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 217–224 (2004)

    Google Scholar 

  34. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: ICML: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  35. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology NAACL 2003, pp. 134–141 (2003)

    Google Scholar 

  36. Shen, R.: Protein secondary structure prediction using conditional random fields and profiles. Master Thesis, Department of Computer Science, Oregon State University (2006)

    Google Scholar 

  37. Lafferty, J., Zhu, X., Liu, Y.: Kernel Conditional Random Fields: Representation and Clique Selection. In: ICML 2004: Proceedings of the twenty-first international conference on Machine learning. ACM Press, New York (2004)

    Google Scholar 

  38. Zhao, F., Li, S., Sterner, B.W., Xu, J.: Discriminative learning for protein conformation sampling. Proteins: Structure, Function, and Bioinformatics 73(1), 228–240 (2008)

    Article  CAS  Google Scholar 

  39. Do, C., Gross, S., Batzoglou, S.: CONTRAlign: Discriminative Training for Protein Sequence Alignment (2006)

    Google Scholar 

  40. Mcguffin, L.J., Bryson, K., Jones, D.T.: The PSIPRED protein structure prediction server. Bioinformatics 16(4), 404–405 (2000)

    Article  CAS  PubMed  Google Scholar 

  41. Qiu, J., Elber, R.: SSALN: An alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins: Structure, Function, and Bioinformatics 62(4), 881–891 (2006)

    Article  CAS  Google Scholar 

  42. Karplus, K., Karchin, R., Shackelford, G., Hughey, R.: Calibrating E-values for Hidden Markov Models using Reverse-Sequence Null Models. Bioinformatics 21(22), 4107–4115 (2005)

    Article  CAS  PubMed  Google Scholar 

  43. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292(2), 195–202 (1999)

    Article  CAS  PubMed  Google Scholar 

  44. Gutmann, B., Kersting, K.: Stratified Gradient Boosting for Fast Training of Conditional Random Fields. In: Proceedings of the 6th International Workshop on Multi-Relational Data Mining, pp. 56–68

    Google Scholar 

  45. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 267–296 (1990)

    Google Scholar 

  46. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)

    Article  CAS  PubMed  Google Scholar 

  47. Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Prediction of coordination number and relative solvent accessibility in proteins. Proteins: Structure, Function, and Genetics 47(2), 142–153 (2002)

    Article  CAS  Google Scholar 

  48. Xu, J.: Fold Recognition by Predicted Alignment Accuracy. IEEE/ACM Transaction of Computational Biology and Bioinformatics 2(2), 157–165 (2005)

    Article  CAS  Google Scholar 

  49. Marti-Renom, M.A., Madhusudhan, M.S., Sali, A.: Alignment of protein sequences by their profiles. Protein Science 13(4), 1071–1087 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Zhang, W., Liu, S., Zhou, Y.: SP5: Improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS ONE 3(6) (2008)

    Google Scholar 

  51. Ellrott, K., Guo, J.T., Olman, V., Xu, Y.: Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. In: Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference, vol. 6, pp. 335–342 (2007)

    Google Scholar 

  52. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)

    CAS  PubMed  Google Scholar 

  53. Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33(7), 2302–2309 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Lackner, P., Koppensteiner, W.A., Sippl, M.J., Domingues, F.S.: ProSup: a refined tool for protein structure alignment. Protein Engneering 13(11), 745–752 (2000)

    Article  CAS  Google Scholar 

  55. Liu, S., Zhang, C., Liang, S., Zhou, Y.: Fold Recognition by Concurrent Use of Solvent Accessibility and Residue Depth. Proteins: Structure, Function, and Bioinformatics 68(3), 636–645 (2007)

    Article  CAS  Google Scholar 

  56. Lindahl, E., Elofsson, A.: Identification of related proteins on family, superfamily and fold level. Journal of Molecular Biology 295(3), 613–625 (2000)

    Article  CAS  PubMed  Google Scholar 

  57. Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22(12), 1456–1463 (2006)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peng, J., Xu, J. (2009). Boosting Protein Threading Accuracy. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02008-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02007-0

  • Online ISBN: 978-3-642-02008-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics