Boosting Protein Threading Accuracy

Peng, Jian; Xu, Jinbo

doi:10.1007/978-3-642-02008-7_3

Jian Peng²⁰ &
Jinbo Xu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1761 Accesses
36 Citations
3 Altmetric

Abstract

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.

This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kihara, D., Skolnick, J.: The PDB is a covering set of small protein structures. Journal of Molecular Biology 334(4), 793–802 (2003)
Article CAS PubMed Google Scholar
Zhang, Y., Skolnick, J.: The protein structure prediction problem could be solved using the current PDB library. Proceedings of National Academy Sciences, USA 102(4), 1029–1034 (2005)
Article CAS Google Scholar
Jones, D.T.: Progress in protein structure prediction. Current Opinion in Structural Biology 7(3), 377–387 (1997)
Article CAS PubMed Google Scholar
Rost, B.: Twilight zone of protein sequence alignments. Protein Engineering 12, 85–94 (1999)
Article CAS PubMed Google Scholar
John, B., Sali, A.: Comparative protein structure modeling by iterative alignment model building and model assessment. Nucleic Acids Research 31(14), 3982–3992 (2003)
Article CAS PubMed PubMed Central Google Scholar
Chivian, Dylan, Baker, David: Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Research 34(17), e112 (2006)
Article Google Scholar
Marko, A.C., Stafford, K., Wymore, T.: Stochastic Pairwise Alignments and Scoring Methods for Comparative Protein Structure Modeling. Journal of Chemical Information and Modeling (March 2007)
Google Scholar
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W., Godzik, A.: FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research 33(Web Server issue) (July 2005)
Google Scholar
Rychlewski, L., Jaroszewski, L., Li, W., Godzik, A.: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 9(2), 232–241 (2000)
Article CAS PubMed PubMed Central Google Scholar
Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. Journal of Molecular Biology (315), 1257–1275 (2002)
Article CAS PubMed Google Scholar
Pei, J., Sadreyev, R., Grishin, N.V.: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19(3), 427–428 (2003)
Article CAS PubMed Google Scholar
Marti-Renom, M.A., Madhusudhan, M.S., Sali, A.: Alignment of protein sequences by their profiles. Protein Science 13(4), 1071–1087 (2004)
Article CAS PubMed PubMed Central Google Scholar
Ginalski, K., Pas, J., Wyrwicz, L.S., von Grotthuss, M., Bujnicki, J.M., Rychlewski, L.: ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Research 31(13), 3804–3807 (2003)
Article CAS PubMed PubMed Central Google Scholar
Zhou, H., Zhou, Y.: Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins: Structure, Function, and Bioinformatics 55(4), 1005–1013 (2004)
Article CAS Google Scholar
Han, S., Lee, B.-C., Yu, S.T., Jeong, C.-S., Lee, S., Kim, D.: Fold recognition by combining profile-profile alignment and support vector machine. Bioinformatics 21(11), 2667–2673 (2005)
Article CAS PubMed Google Scholar
Shi, J., Blundell, T.L., Mizuguchi, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310(1), 243–257 (2001)
Article CAS PubMed Google Scholar
Jones, D.T.: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. Journal of Molecular Biology 287(4), 797–815 (1999)
Article CAS PubMed Google Scholar
Kelley, L.A., MacCallum, R.M., Sternberg, M.J.: Enhanced genome annotation using structural profiles in the program 3D-PSSM. Journal of Molecular Biology 299(2), 499–520 (2000)
Article CAS PubMed Google Scholar
Zhou, H., Zhou, Y.: Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins: Structure, Function, and Bioinformatics 58(2), 321–328 (2005)
Article CAS Google Scholar
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14(10), 846–856 (1998)
Article CAS PubMed Google Scholar
Johannes, S.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7), 951–960 (2005)
Article Google Scholar
Xu, J., Li, M., Lin, G., Kim, D., Xu, Y.: Protein threading by linear programming. In: The Pacific Symposium on Biocomputing, pp. 264–275 (2003)
Google Scholar
Xu, J., Li, M., Kim, D., Xu, Y.: RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1(1), 95–117 (2003)
Article CAS PubMed Google Scholar
Xu, J., Li, M.: Assessment of RAPTOR’s linear programming approach in CAFASP3. Proteins: Structure, Function and Genetics (2003)
Google Scholar
Xu, J., Jiao, F., Berger, B.: A tree-decomposition approach to protein structure prediction. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp. 247–256 (2005)
Google Scholar
Rai, B.K., Fiser, A.: Multiple mapping method: a novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling. Proteins: Structure, Function, and Bioinformatics 63(3), 644–661 (2006)
Article CAS Google Scholar
Wu, S., Zhang, Y.: MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins: Structure, Function, and Bioinformatics 9999(9999), NA+ (2008)
Google Scholar
Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 5, 17+ (2007)
Article PubMed PubMed Central Google Scholar
Silva, P.J.: Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis. Proteins: Structure, Function, and Bioinformatics 70(4), 1588–1594 (2008)
Article CAS Google Scholar
Skolnick, J., Kihara, D.: Defrosting the frozen approximation: PROSPECTOR - a new approach to threading. Proteins: Structure, Function, and Genetics 42(3), 319–331 (2001)
Article CAS Google Scholar
Kim, D., Xu, D., Guo, J., Ellrott, K., Xu, Y.: PROSPECT II: Protein structure prediction method for genome-scale applications. Protein Engineering (2002)
Google Scholar
Yu, C.N., Joachims, T., Elber, R., Pillardy, J.: Support vector training of protein alignment models. Journal of Computational Biology 15(7), 867–880 (2008)
Article CAS PubMed PubMed Central Google Scholar
Dietterich, T.G., Ashenfelter, A., Bulatov, Y.: Training Conditional Random Fields via Gradient Tree Boosting. In: Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 217–224 (2004)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: ICML: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology NAACL 2003, pp. 134–141 (2003)
Google Scholar
Shen, R.: Protein secondary structure prediction using conditional random fields and profiles. Master Thesis, Department of Computer Science, Oregon State University (2006)
Google Scholar
Lafferty, J., Zhu, X., Liu, Y.: Kernel Conditional Random Fields: Representation and Clique Selection. In: ICML 2004: Proceedings of the twenty-first international conference on Machine learning. ACM Press, New York (2004)
Google Scholar
Zhao, F., Li, S., Sterner, B.W., Xu, J.: Discriminative learning for protein conformation sampling. Proteins: Structure, Function, and Bioinformatics 73(1), 228–240 (2008)
Article CAS Google Scholar
Do, C., Gross, S., Batzoglou, S.: CONTRAlign: Discriminative Training for Protein Sequence Alignment (2006)
Google Scholar
Mcguffin, L.J., Bryson, K., Jones, D.T.: The PSIPRED protein structure prediction server. Bioinformatics 16(4), 404–405 (2000)
Article CAS PubMed Google Scholar
Qiu, J., Elber, R.: SSALN: An alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins: Structure, Function, and Bioinformatics 62(4), 881–891 (2006)
Article CAS Google Scholar
Karplus, K., Karchin, R., Shackelford, G., Hughey, R.: Calibrating E-values for Hidden Markov Models using Reverse-Sequence Null Models. Bioinformatics 21(22), 4107–4115 (2005)
Article CAS PubMed Google Scholar
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292(2), 195–202 (1999)
Article CAS PubMed Google Scholar
Gutmann, B., Kersting, K.: Stratified Gradient Boosting for Fast Training of Conditional Random Fields. In: Proceedings of the 6th International Workshop on Multi-Relational Data Mining, pp. 56–68
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 267–296 (1990)
Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Article CAS PubMed Google Scholar
Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Prediction of coordination number and relative solvent accessibility in proteins. Proteins: Structure, Function, and Genetics 47(2), 142–153 (2002)
Article CAS Google Scholar
Xu, J.: Fold Recognition by Predicted Alignment Accuracy. IEEE/ACM Transaction of Computational Biology and Bioinformatics 2(2), 157–165 (2005)
Article CAS Google Scholar
Marti-Renom, M.A., Madhusudhan, M.S., Sali, A.: Alignment of protein sequences by their profiles. Protein Science 13(4), 1071–1087 (2004)
Article CAS PubMed PubMed Central Google Scholar
Zhang, W., Liu, S., Zhou, Y.: SP5: Improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS ONE 3(6) (2008)
Google Scholar
Ellrott, K., Guo, J.T., Olman, V., Xu, Y.: Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. In: Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference, vol. 6, pp. 335–342 (2007)
Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)
CAS PubMed Google Scholar
Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33(7), 2302–2309 (2005)
Article CAS PubMed PubMed Central Google Scholar
Lackner, P., Koppensteiner, W.A., Sippl, M.J., Domingues, F.S.: ProSup: a refined tool for protein structure alignment. Protein Engneering 13(11), 745–752 (2000)
Article CAS Google Scholar
Liu, S., Zhang, C., Liang, S., Zhou, Y.: Fold Recognition by Concurrent Use of Solvent Accessibility and Residue Depth. Proteins: Structure, Function, and Bioinformatics 68(3), 636–645 (2007)
Article CAS Google Scholar
Lindahl, E., Elofsson, A.: Identification of related proteins on family, superfamily and fold level. Journal of Molecular Biology 295(3), 613–625 (2000)
Article CAS PubMed Google Scholar
Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22(12), 1456–1463 (2006)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Technological Institute at Chicago, Chicago, IL, USA, 60637
Jian Peng & Jinbo Xu

Authors

Jian Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jinbo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, James H. Clark Center, 318 Campus Drive, RM S266, CA 94305-5428,, Stanford, USA
Serafim Batzoglou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, J., Xu, J. (2009). Boosting Protein Threading Accuracy. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-02008-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02007-0
Online ISBN: 978-3-642-02008-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics