Protein Structure Prediction Using Threading

Xu, Jinbo; Jiao, Feng; Yu, Libo

doi:10.1007/978-1-59745-574-9_4

Jinbo Xu,
Feng Jiao &
Libo Yu

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 413))

1874 Accesses
6 Citations

Summary

This chapter discusses the protocol for computational protein structure prediction by protein threading. First, we present a general procedure and summarize some typical ideas for each step of protein threading. Then, we describe the design and implementation of RAPTOR, a protein structure prediction program based on threading. The major focuses are three key components of RAPTOR: a linear programming approach to protein threading, two machine learning approaches (SVM and Gradient Boosting) to fold recognition, and evaluation of the statistical significance of the prediction results. The first part of this chapter is a brief review of protein threading, and the second part contains original research results. Some key ideas and results have been previously published

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kihara, D. and J. Skolnick, The PDB is a covering set of small protein structures. J Mol Biol, 2003. 334(4): p. 793–802.
Article CAS PubMed Google Scholar
Zhang, Y. and J. Skolnick, The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA, 2005. 102(4): p. 1029–1034.
Google Scholar
Rost, B., Twilight zone of protein sequence alignments. Protein Eng, 1999. 12: p. 85–94.
Article CAS PubMed Google Scholar
Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995. 247: p. 536–540.
CAS PubMed Google Scholar
Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25: p. 3389–3402.
Article CAS PubMed Google Scholar
Higgins, D., et al., CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22: p. 4673–4680.
Article PubMed Google Scholar
Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999. 292: p. 195–202.
Article CAS PubMed Google Scholar
Rost, B., C. Sander, and R. Schneider, PHD–an automatic mail server for protein secondary structure prediction. Comput Appl Biosci, 1994. 10(1): p. 53–60.
CAS PubMed Google Scholar
Shi, J., L.B. Tom, and M. Kenji, FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 2001. 310: p. 243–257.
Article CAS PubMed Google Scholar
Kelley, L.A., R.M. MacCallum, and M.J. Sternberg, Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 2000. 299(2): p. 499–520.
Article CAS PubMed Google Scholar
Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2003. 16(9): p. 641–650.
CAS Google Scholar
Xu, J., et al., RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol, 2003. 1(1): p. 95–9117.
Article CAS PubMed Google Scholar
Jones, D.T., GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 1999. 287: p. 797–815.
Article CAS PubMed Google Scholar
Fischer, D. Hybrid fold recognition: combining sequence derived properties with evolutionary Information. Proceedings of the 2000 Pacific Symposium Biocomputing. 2000, World.
Google Scholar
Rychlewski, L., et al., Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci, 2000. 9(2): p. 232–241.
Article CAS PubMed Google Scholar
Karplus, K., et al., Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins, 2003. 53 Suppl 6: p. 491–496.
Article CAS PubMed Google Scholar
Rost, B., R. Schneider, and C. Sander, Protein fold recognition by prediction-based threading. J Mol Biol, 1997. 270(3): p. 471–480.
Article CAS PubMed Google Scholar
Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2002. 16(9): p. 10.
Google Scholar
Al-Lazikani, B., F. Sheinerman, and B. Honig, Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc Natl Acad Sci, 2001. 98(26): p. 14796–14801.
Article CAS PubMed Google Scholar
H. Zhou and Y. Zhou,‘SPARKS 2 and SP3 servers in CASP 6.’, Proteins, S7, p. 152–156, 2005.
Article Google Scholar
Kabsch, W. and C. Sander, Dictionary of protein secondary structure: protein recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22: p. 2577–2637.
Article CAS PubMed Google Scholar
Xu, Y. and D. Xu, Protein threading using PROSPECT: design and evaluation. Proteins, 2000. 40: p. 343–354.
Article CAS PubMed Google Scholar
Singh, R. K., A. Tropsha, and Vaisman, II, Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol, 1996. 3(2): p. 213–221.
Article CAS PubMed Google Scholar
Skolnick, J. and D. Kihara, Defrosting the frozen approximation: PROSPECTOR–a new approach to threading. Proteins, 2001. 42(3): p. 319–331.
Article CAS PubMed Google Scholar
Zheng, W., et al., A new approach to protein fold recognition based on Delaunay tessellation of protein structure, Pacific Symposium in Biocomputing. 1997. p. 486–497.
Google Scholar
McConkey, B.J., V. Sobolev, and M. Edelman, Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics, 2002. 18(10): p. 1365–1373.
Article CAS PubMed Google Scholar
Madej, T., J. F. Gibrat, and S. H. Bryant, Threading a database of protein cores. Proteins, 1995. 23.
Google Scholar
Lathrop, R., et al., A Bayes-optimal probability theory that unifies protein sequence-structure recognition and alignment. Bull Math Biol, 1998. 60: p. 1039–1071.
Article CAS PubMed Google Scholar
Zhang, C., et al., An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci, 2004. 13(2): p. 400–411.
Article CAS PubMed Google Scholar
Meller, J. and R. Elber, Linear programming optimization and a double statistical filter for protein threading protocols. Proteins, 2001. 45(3): p. 241–261.
Article CAS PubMed Google Scholar
Lathrop, R.H., The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng, 1994. 7: p. 1059–1068.
Article CAS PubMed Google Scholar
Akutsu, T. and S. Miyano, On the approximation of protein threading. Theor Comput Sci, 1999. 210: p. 261–275.
Article Google Scholar
Bryant, S. H. and C. E. Lawrence, An empirical energy function for threading protein sequence through folding motif. Proteins, 1993. 16: p. 92–112.
Article CAS PubMed Google Scholar
Jones, D. T., W. R. Taylor, and J. M. Thornton, A new approach to protein fold recognition. Nature, 1992. 358: p. 86–98.
Article CAS PubMed Google Scholar
Lathrop, R. H. and T. F. Smith. A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions. Proceedings of the 27th Hawaii International Conference on System Sciences. 1994: IEEE.
Google Scholar
Lathrop, R. H. and T. F. Smith, Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol, 1996. 255: p. 641–665.
Article CAS PubMed Google Scholar
Lathrop, R. H., An anytime local-to-global optimization algorithm for protein threading in theta (m2n2) space. J Comput Biol, 1999. 6(3–4): p. 405–418.
Article CAS PubMed Google Scholar
Xu, Y., D. Xu, and E. C. Uberbacher, An efficient computational method for globally optimal threadings. J Comput Biol, 1998. 5(3): p. 597–614.
Article CAS PubMed Google Scholar
Xu, J., F. Jiao, and B. Berger, A tree-decomposition approach to protein structure prediction. Proc IEEE Comput Syst Bioinform Conf, 2005. p. 247–256.
Google Scholar
Godzik, A., A. Kolinski, and J. Skolnick, Topology fingerprint approach to the inverse protein folding problem. J Mol Biol, 1992. 227(1): p. 227–238.
Article CAS PubMed Google Scholar
Thiele, R., R. Zimmer, and T. Lenguaer, Protein threading by recursive dynamic programming. J Mol Biol, 1999. 290: p. 757–779.
Article CAS PubMed Google Scholar
S. Balev, Solving the Protein Threading Problem by Lagrangian Relaxation, In Proceedings of 6th Workshop on Algorithms in Bioinformatics (WABI 2004), LNBI 3240, p. 182–193, 2004.
Google Scholar
Bryant, S. H. and S. F. Altschul, Statistics of sequence-structure threading. Curr Opin Struct Biol, 1995. 5: p. 236–244.
Article CAS PubMed Google Scholar
Xu, Y., D. Xu, and V. Olman, A practical method for interpretation of threading scores: an application of neural networks. Statistica Sinica Special Issue on Bioinformatics, 2002. 12: p. 159–177.
Google Scholar
Wallner, B. and A. Elofsson, Can correct protein models be identified? Protein Sci, 2003. 12(5): p. 1073–1086.
Article CAS PubMed Google Scholar
Xu, J., Fold recognition by predicted alignment accuracy. IEEE/ACM Trans Comput Biol Bioinformatics, 2005. 2(2): p. 157–165.
Article CAS Google Scholar
Xu, J., et al., Protein threading by linear programming, Pacific Symposium in Biocomputing. 2003. p. 264–275.
Google Scholar
Alexandrov, N. N., R. Nussinov, and R. M. Zimmer, Fast protein fold recognition via sequence to structure alignment and contact capacity potentials, Pacific Symposium in Biocomputing. 1996. Hawaii, USA. p. 53–72.
Google Scholar
Shepp, L., Linear Programming in Tomography, Probability and Finance, DIMACS TR97-67, 1997, Rutgers University, NJ, USA.
Google Scholar
Dorfman, R., P.A. Samuelson and R.M. Solow, Linear Programming and Economic Analysis, 1987, Mc-Graw Hill Co., New York.
Google Scholar
Schrijver, A., Theory of Linear and Integer Programming. 1998. John Wiley & Sons, New York.
Google Scholar
Beasley, J.E., Advances in Linear and Integer Programming. 1996. Oxford University Press, University of Oxford, United Kingdom.
Google Scholar
Vanderbei, R. J., Integer Programming. 2001. Springer, New York. p. 307–313.
Google Scholar
Dantzig, G.B., Linear Programming and Extensions. 1963. Princeton University Press, Princeton, N. J.
Google Scholar
Karmarkar, N., A new polynomial-time algorithm for linear programming. Combinatorica, 1984. 4: p. 373–395.
Article Google Scholar
Alexandrov, N. N., SARFing the PDB. Protein Eng, 1996. 9: p. 727–732.
Article CAS PubMed Google Scholar
Holm, L. and C. Sander, Mapping the protein universe. Science, 1996. 273: p. 595–602.
Article CAS PubMed Google Scholar
Holm, L. and C. Sander. Decision support system for the evolutionary classification of protein structures. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. 1997.
Google Scholar
Fischer, D., et al. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Proceedings of the 1996 Pacific Symposium on Biocomputing. 1996. World.
Google Scholar
Lindahl, E. and A. Elofsson, Identification of related proteins on family, superfamily and fold level. J Mol Biol, 2000. 295: p. 613–625.
Article CAS PubMed Google Scholar
Vapnik, V. N., The Nature of Statistical Learning Theory. 1995. Springer, New York.
Google Scholar
Burges, C. J. C., A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998. 2(2), 121–167.
Article Google Scholar
Freund, Y. and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory. 1995.
Google Scholar
Friedman, J. H., Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001. 29(5), 1189–1232.
Article Google Scholar
D. Michie, D.J. Spiegelhalter, C.C. Taylor, Machine Learning, Neural and Statistical Classification (edit collection). 1994. Elllis Horwood, London.
Google Scholar
Zhang, Y. and J. Skolnick, Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci, 2004. 101(20): p. 7594–7599.
Google Scholar
Simons, K., et al., Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins, 1999. S3: p. 171–176.
Article Google Scholar

Download references

Authors

Jinbo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Libo Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rensselaer Polytechnic Institute, Troy, New York, USA
Mohammed J. Zaki & Christopher Bystroff &

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Xu, J., Jiao, F., Yu, L. (2008). Protein Structure Prediction Using Threading. In: Zaki, M.J., Bystroff, C. (eds) Protein Structure Prediction. Methods in Molecular Biology™, vol 413. Humana Press. https://doi.org/10.1007/978-1-59745-574-9_4

Download citation

DOI: https://doi.org/10.1007/978-1-59745-574-9_4
Publisher Name: Humana Press
Print ISBN: 978-1-58829-752-5
Online ISBN: 978-1-59745-574-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics