Summary
This chapter discusses the protocol for computational protein structure prediction by protein threading. First, we present a general procedure and summarize some typical ideas for each step of protein threading. Then, we describe the design and implementation of RAPTOR, a protein structure prediction program based on threading. The major focuses are three key components of RAPTOR: a linear programming approach to protein threading, two machine learning approaches (SVM and Gradient Boosting) to fold recognition, and evaluation of the statistical significance of the prediction results. The first part of this chapter is a brief review of protein threading, and the second part contains original research results. Some key ideas and results have been previously published
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kihara, D. and J. Skolnick, The PDB is a covering set of small protein structures. J Mol Biol, 2003. 334(4): p. 793–802.
Zhang, Y. and J. Skolnick, The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA, 2005. 102(4): p. 1029–1034.
Rost, B., Twilight zone of protein sequence alignments. Protein Eng, 1999. 12: p. 85–94.
Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995. 247: p. 536–540.
Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25: p. 3389–3402.
Higgins, D., et al., CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22: p. 4673–4680.
Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999. 292: p. 195–202.
Rost, B., C. Sander, and R. Schneider, PHD–an automatic mail server for protein secondary structure prediction. Comput Appl Biosci, 1994. 10(1): p. 53–60.
Shi, J., L.B. Tom, and M. Kenji, FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 2001. 310: p. 243–257.
Kelley, L.A., R.M. MacCallum, and M.J. Sternberg, Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 2000. 299(2): p. 499–520.
Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2003. 16(9): p. 641–650.
Xu, J., et al., RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol, 2003. 1(1): p. 95–9117.
Jones, D.T., GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 1999. 287: p. 797–815.
Fischer, D. Hybrid fold recognition: combining sequence derived properties with evolutionary Information. Proceedings of the 2000 Pacific Symposium Biocomputing. 2000, World.
Rychlewski, L., et al., Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci, 2000. 9(2): p. 232–241.
Karplus, K., et al., Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins, 2003. 53 Suppl 6: p. 491–496.
Rost, B., R. Schneider, and C. Sander, Protein fold recognition by prediction-based threading. J Mol Biol, 1997. 270(3): p. 471–480.
Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2002. 16(9): p. 10.
Al-Lazikani, B., F. Sheinerman, and B. Honig, Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc Natl Acad Sci, 2001. 98(26): p. 14796–14801.
H. Zhou and Y. Zhou,‘SPARKS 2 and SP3 servers in CASP 6.’, Proteins, S7, p. 152–156, 2005.
Kabsch, W. and C. Sander, Dictionary of protein secondary structure: protein recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22: p. 2577–2637.
Xu, Y. and D. Xu, Protein threading using PROSPECT: design and evaluation. Proteins, 2000. 40: p. 343–354.
Singh, R. K., A. Tropsha, and Vaisman, II, Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol, 1996. 3(2): p. 213–221.
Skolnick, J. and D. Kihara, Defrosting the frozen approximation: PROSPECTOR–a new approach to threading. Proteins, 2001. 42(3): p. 319–331.
Zheng, W., et al., A new approach to protein fold recognition based on Delaunay tessellation of protein structure, Pacific Symposium in Biocomputing. 1997. p. 486–497.
McConkey, B.J., V. Sobolev, and M. Edelman, Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics, 2002. 18(10): p. 1365–1373.
Madej, T., J. F. Gibrat, and S. H. Bryant, Threading a database of protein cores. Proteins, 1995. 23.
Lathrop, R., et al., A Bayes-optimal probability theory that unifies protein sequence-structure recognition and alignment. Bull Math Biol, 1998. 60: p. 1039–1071.
Zhang, C., et al., An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci, 2004. 13(2): p. 400–411.
Meller, J. and R. Elber, Linear programming optimization and a double statistical filter for protein threading protocols. Proteins, 2001. 45(3): p. 241–261.
Lathrop, R.H., The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng, 1994. 7: p. 1059–1068.
Akutsu, T. and S. Miyano, On the approximation of protein threading. Theor Comput Sci, 1999. 210: p. 261–275.
Bryant, S. H. and C. E. Lawrence, An empirical energy function for threading protein sequence through folding motif. Proteins, 1993. 16: p. 92–112.
Jones, D. T., W. R. Taylor, and J. M. Thornton, A new approach to protein fold recognition. Nature, 1992. 358: p. 86–98.
Lathrop, R. H. and T. F. Smith. A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions. Proceedings of the 27th Hawaii International Conference on System Sciences. 1994: IEEE.
Lathrop, R. H. and T. F. Smith, Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol, 1996. 255: p. 641–665.
Lathrop, R. H., An anytime local-to-global optimization algorithm for protein threading in theta (m2n2) space. J Comput Biol, 1999. 6(3–4): p. 405–418.
Xu, Y., D. Xu, and E. C. Uberbacher, An efficient computational method for globally optimal threadings. J Comput Biol, 1998. 5(3): p. 597–614.
Xu, J., F. Jiao, and B. Berger, A tree-decomposition approach to protein structure prediction. Proc IEEE Comput Syst Bioinform Conf, 2005. p. 247–256.
Godzik, A., A. Kolinski, and J. Skolnick, Topology fingerprint approach to the inverse protein folding problem. J Mol Biol, 1992. 227(1): p. 227–238.
Thiele, R., R. Zimmer, and T. Lenguaer, Protein threading by recursive dynamic programming. J Mol Biol, 1999. 290: p. 757–779.
S. Balev, Solving the Protein Threading Problem by Lagrangian Relaxation, In Proceedings of 6th Workshop on Algorithms in Bioinformatics (WABI 2004), LNBI 3240, p. 182–193, 2004.
Bryant, S. H. and S. F. Altschul, Statistics of sequence-structure threading. Curr Opin Struct Biol, 1995. 5: p. 236–244.
Xu, Y., D. Xu, and V. Olman, A practical method for interpretation of threading scores: an application of neural networks. Statistica Sinica Special Issue on Bioinformatics, 2002. 12: p. 159–177.
Wallner, B. and A. Elofsson, Can correct protein models be identified? Protein Sci, 2003. 12(5): p. 1073–1086.
Xu, J., Fold recognition by predicted alignment accuracy. IEEE/ACM Trans Comput Biol Bioinformatics, 2005. 2(2): p. 157–165.
Xu, J., et al., Protein threading by linear programming, Pacific Symposium in Biocomputing. 2003. p. 264–275.
Alexandrov, N. N., R. Nussinov, and R. M. Zimmer, Fast protein fold recognition via sequence to structure alignment and contact capacity potentials, Pacific Symposium in Biocomputing. 1996. Hawaii, USA. p. 53–72.
Shepp, L., Linear Programming in Tomography, Probability and Finance, DIMACS TR97-67, 1997, Rutgers University, NJ, USA.
Dorfman, R., P.A. Samuelson and R.M. Solow, Linear Programming and Economic Analysis, 1987, Mc-Graw Hill Co., New York.
Schrijver, A., Theory of Linear and Integer Programming. 1998. John Wiley & Sons, New York.
Beasley, J.E., Advances in Linear and Integer Programming. 1996. Oxford University Press, University of Oxford, United Kingdom.
Vanderbei, R. J., Integer Programming. 2001. Springer, New York. p. 307–313.
Dantzig, G.B., Linear Programming and Extensions. 1963. Princeton University Press, Princeton, N. J.
Karmarkar, N., A new polynomial-time algorithm for linear programming. Combinatorica, 1984. 4: p. 373–395.
Alexandrov, N. N., SARFing the PDB. Protein Eng, 1996. 9: p. 727–732.
Holm, L. and C. Sander, Mapping the protein universe. Science, 1996. 273: p. 595–602.
Holm, L. and C. Sander. Decision support system for the evolutionary classification of protein structures. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. 1997.
Fischer, D., et al. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Proceedings of the 1996 Pacific Symposium on Biocomputing. 1996. World.
Lindahl, E. and A. Elofsson, Identification of related proteins on family, superfamily and fold level. J Mol Biol, 2000. 295: p. 613–625.
Vapnik, V. N., The Nature of Statistical Learning Theory. 1995. Springer, New York.
Burges, C. J. C., A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998. 2(2), 121–167.
Freund, Y. and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory. 1995.
Friedman, J. H., Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001. 29(5), 1189–1232.
D. Michie, D.J. Spiegelhalter, C.C. Taylor, Machine Learning, Neural and Statistical Classification (edit collection). 1994. Elllis Horwood, London.
Zhang, Y. and J. Skolnick, Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci, 2004. 101(20): p. 7594–7599.
Simons, K., et al., Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins, 1999. S3: p. 171–176.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press Inc
About this protocol
Cite this protocol
Xu, J., Jiao, F., Yu, L. (2008). Protein Structure Prediction Using Threading. In: Zaki, M.J., Bystroff, C. (eds) Protein Structure Prediction. Methods in Molecular Biology™, vol 413. Humana Press. https://doi.org/10.1007/978-1-59745-574-9_4
Download citation
DOI: https://doi.org/10.1007/978-1-59745-574-9_4
Publisher Name: Humana Press
Print ISBN: 978-1-58829-752-5
Online ISBN: 978-1-59745-574-9
eBook Packages: Springer Protocols