Skip to main content

Protein Structure Prediction Using Threading

  • Protocol
Protein Structure Prediction

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 413))

Summary

This chapter discusses the protocol for computational protein structure prediction by protein threading. First, we present a general procedure and summarize some typical ideas for each step of protein threading. Then, we describe the design and implementation of RAPTOR, a protein structure prediction program based on threading. The major focuses are three key components of RAPTOR: a linear programming approach to protein threading, two machine learning approaches (SVM and Gradient Boosting) to fold recognition, and evaluation of the statistical significance of the prediction results. The first part of this chapter is a brief review of protein threading, and the second part contains original research results. Some key ideas and results have been previously published

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kihara, D. and J. Skolnick, The PDB is a covering set of small protein structures. J Mol Biol, 2003. 334(4): p. 793–802.

    Article  CAS  PubMed  Google Scholar 

  2. Zhang, Y. and J. Skolnick, The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA, 2005. 102(4): p. 1029–1034.

    Google Scholar 

  3. Rost, B., Twilight zone of protein sequence alignments. Protein Eng, 1999. 12: p. 85–94.

    Article  CAS  PubMed  Google Scholar 

  4. Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995. 247: p. 536–540.

    CAS  PubMed  Google Scholar 

  5. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25: p. 3389–3402.

    Article  CAS  PubMed  Google Scholar 

  6. Higgins, D., et al., CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22: p. 4673–4680.

    Article  PubMed  Google Scholar 

  7. Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999. 292: p. 195–202.

    Article  CAS  PubMed  Google Scholar 

  8. Rost, B., C. Sander, and R. Schneider, PHD–an automatic mail server for protein secondary structure prediction. Comput Appl Biosci, 1994. 10(1): p. 53–60.

    CAS  PubMed  Google Scholar 

  9. Shi, J., L.B. Tom, and M. Kenji, FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 2001. 310: p. 243–257.

    Article  CAS  PubMed  Google Scholar 

  10. Kelley, L.A., R.M. MacCallum, and M.J. Sternberg, Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 2000. 299(2): p. 499–520.

    Article  CAS  PubMed  Google Scholar 

  11. Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2003. 16(9): p. 641–650.

    CAS  Google Scholar 

  12. Xu, J., et al., RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol, 2003. 1(1): p. 95–9117.

    Article  CAS  PubMed  Google Scholar 

  13. Jones, D.T., GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 1999. 287: p. 797–815.

    Article  CAS  PubMed  Google Scholar 

  14. Fischer, D. Hybrid fold recognition: combining sequence derived properties with evolutionary Information. Proceedings of the 2000 Pacific Symposium Biocomputing. 2000, World.

    Google Scholar 

  15. Rychlewski, L., et al., Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci, 2000. 9(2): p. 232–241.

    Article  CAS  PubMed  Google Scholar 

  16. Karplus, K., et al., Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins, 2003. 53 Suppl 6: p. 491–496.

    Article  CAS  PubMed  Google Scholar 

  17. Rost, B., R. Schneider, and C. Sander, Protein fold recognition by prediction-based threading. J Mol Biol, 1997. 270(3): p. 471–480.

    Article  CAS  PubMed  Google Scholar 

  18. Kim, D., et al., PROSPECT II: protein structure prediction method for genome-scale applications. Bioinformatics, 2002. 16(9): p. 10.

    Google Scholar 

  19. Al-Lazikani, B., F. Sheinerman, and B. Honig, Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc Natl Acad Sci, 2001. 98(26): p. 14796–14801.

    Article  CAS  PubMed  Google Scholar 

  20. H. Zhou and Y. Zhou,‘SPARKS 2 and SP3 servers in CASP 6.’, Proteins, S7, p. 152–156, 2005.

    Article  Google Scholar 

  21. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: protein recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22: p. 2577–2637.

    Article  CAS  PubMed  Google Scholar 

  22. Xu, Y. and D. Xu, Protein threading using PROSPECT: design and evaluation. Proteins, 2000. 40: p. 343–354.

    Article  CAS  PubMed  Google Scholar 

  23. Singh, R. K., A. Tropsha, and Vaisman, II, Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol, 1996. 3(2): p. 213–221.

    Article  CAS  PubMed  Google Scholar 

  24. Skolnick, J. and D. Kihara, Defrosting the frozen approximation: PROSPECTOR–a new approach to threading. Proteins, 2001. 42(3): p. 319–331.

    Article  CAS  PubMed  Google Scholar 

  25. Zheng, W., et al., A new approach to protein fold recognition based on Delaunay tessellation of protein structure, Pacific Symposium in Biocomputing. 1997. p. 486–497.

    Google Scholar 

  26. McConkey, B.J., V. Sobolev, and M. Edelman, Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics, 2002. 18(10): p. 1365–1373.

    Article  CAS  PubMed  Google Scholar 

  27. Madej, T., J. F. Gibrat, and S. H. Bryant, Threading a database of protein cores. Proteins, 1995. 23.

    Google Scholar 

  28. Lathrop, R., et al., A Bayes-optimal probability theory that unifies protein sequence-structure recognition and alignment. Bull Math Biol, 1998. 60: p. 1039–1071.

    Article  CAS  PubMed  Google Scholar 

  29. Zhang, C., et al., An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci, 2004. 13(2): p. 400–411.

    Article  CAS  PubMed  Google Scholar 

  30. Meller, J. and R. Elber, Linear programming optimization and a double statistical filter for protein threading protocols. Proteins, 2001. 45(3): p. 241–261.

    Article  CAS  PubMed  Google Scholar 

  31. Lathrop, R.H., The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng, 1994. 7: p. 1059–1068.

    Article  CAS  PubMed  Google Scholar 

  32. Akutsu, T. and S. Miyano, On the approximation of protein threading. Theor Comput Sci, 1999. 210: p. 261–275.

    Article  Google Scholar 

  33. Bryant, S. H. and C. E. Lawrence, An empirical energy function for threading protein sequence through folding motif. Proteins, 1993. 16: p. 92–112.

    Article  CAS  PubMed  Google Scholar 

  34. Jones, D. T., W. R. Taylor, and J. M. Thornton, A new approach to protein fold recognition. Nature, 1992. 358: p. 86–98.

    Article  CAS  PubMed  Google Scholar 

  35. Lathrop, R. H. and T. F. Smith. A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions. Proceedings of the 27th Hawaii International Conference on System Sciences. 1994: IEEE.

    Google Scholar 

  36. Lathrop, R. H. and T. F. Smith, Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol, 1996. 255: p. 641–665.

    Article  CAS  PubMed  Google Scholar 

  37. Lathrop, R. H., An anytime local-to-global optimization algorithm for protein threading in theta (m2n2) space. J Comput Biol, 1999. 6(3–4): p. 405–418.

    Article  CAS  PubMed  Google Scholar 

  38. Xu, Y., D. Xu, and E. C. Uberbacher, An efficient computational method for globally optimal threadings. J Comput Biol, 1998. 5(3): p. 597–614.

    Article  CAS  PubMed  Google Scholar 

  39. Xu, J., F. Jiao, and B. Berger, A tree-decomposition approach to protein structure prediction. Proc IEEE Comput Syst Bioinform Conf, 2005. p. 247–256.

    Google Scholar 

  40. Godzik, A., A. Kolinski, and J. Skolnick, Topology fingerprint approach to the inverse protein folding problem. J Mol Biol, 1992. 227(1): p. 227–238.

    Article  CAS  PubMed  Google Scholar 

  41. Thiele, R., R. Zimmer, and T. Lenguaer, Protein threading by recursive dynamic programming. J Mol Biol, 1999. 290: p. 757–779.

    Article  CAS  PubMed  Google Scholar 

  42. S. Balev, Solving the Protein Threading Problem by Lagrangian Relaxation, In Proceedings of 6th Workshop on Algorithms in Bioinformatics (WABI 2004), LNBI 3240, p. 182–193, 2004.

    Google Scholar 

  43. Bryant, S. H. and S. F. Altschul, Statistics of sequence-structure threading. Curr Opin Struct Biol, 1995. 5: p. 236–244.

    Article  CAS  PubMed  Google Scholar 

  44. Xu, Y., D. Xu, and V. Olman, A practical method for interpretation of threading scores: an application of neural networks. Statistica Sinica Special Issue on Bioinformatics, 2002. 12: p. 159–177.

    Google Scholar 

  45. Wallner, B. and A. Elofsson, Can correct protein models be identified? Protein Sci, 2003. 12(5): p. 1073–1086.

    Article  CAS  PubMed  Google Scholar 

  46. Xu, J., Fold recognition by predicted alignment accuracy. IEEE/ACM Trans Comput Biol Bioinformatics, 2005. 2(2): p. 157–165.

    Article  CAS  Google Scholar 

  47. Xu, J., et al., Protein threading by linear programming, Pacific Symposium in Biocomputing. 2003. p. 264–275.

    Google Scholar 

  48. Alexandrov, N. N., R. Nussinov, and R. M. Zimmer, Fast protein fold recognition via sequence to structure alignment and contact capacity potentials, Pacific Symposium in Biocomputing. 1996. Hawaii, USA. p. 53–72.

    Google Scholar 

  49. Shepp, L., Linear Programming in Tomography, Probability and Finance, DIMACS TR97-67, 1997, Rutgers University, NJ, USA.

    Google Scholar 

  50. Dorfman, R., P.A. Samuelson and R.M. Solow, Linear Programming and Economic Analysis, 1987, Mc-Graw Hill Co., New York.

    Google Scholar 

  51. Schrijver, A., Theory of Linear and Integer Programming. 1998. John Wiley & Sons, New York.

    Google Scholar 

  52. Beasley, J.E., Advances in Linear and Integer Programming. 1996. Oxford University Press, University of Oxford, United Kingdom.

    Google Scholar 

  53. Vanderbei, R. J., Integer Programming. 2001. Springer, New York. p. 307–313.

    Google Scholar 

  54. Dantzig, G.B., Linear Programming and Extensions. 1963. Princeton University Press, Princeton, N. J.

    Google Scholar 

  55. Karmarkar, N., A new polynomial-time algorithm for linear programming. Combinatorica, 1984. 4: p. 373–395.

    Article  Google Scholar 

  56. Alexandrov, N. N., SARFing the PDB. Protein Eng, 1996. 9: p. 727–732.

    Article  CAS  PubMed  Google Scholar 

  57. Holm, L. and C. Sander, Mapping the protein universe. Science, 1996. 273: p. 595–602.

    Article  CAS  PubMed  Google Scholar 

  58. Holm, L. and C. Sander. Decision support system for the evolutionary classification of protein structures. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. 1997.

    Google Scholar 

  59. Fischer, D., et al. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Proceedings of the 1996 Pacific Symposium on Biocomputing. 1996. World.

    Google Scholar 

  60. Lindahl, E. and A. Elofsson, Identification of related proteins on family, superfamily and fold level. J Mol Biol, 2000. 295: p. 613–625.

    Article  CAS  PubMed  Google Scholar 

  61. Vapnik, V. N., The Nature of Statistical Learning Theory. 1995. Springer, New York.

    Google Scholar 

  62. Burges, C. J. C., A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998. 2(2), 121–167.

    Article  Google Scholar 

  63. Freund, Y. and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory. 1995.

    Google Scholar 

  64. Friedman, J. H., Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001. 29(5), 1189–1232.

    Article  Google Scholar 

  65. D. Michie, D.J. Spiegelhalter, C.C. Taylor, Machine Learning, Neural and Statistical Classification (edit collection). 1994. Elllis Horwood, London.

    Google Scholar 

  66. Zhang, Y. and J. Skolnick, Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci, 2004. 101(20): p. 7594–7599.

    Google Scholar 

  67. Simons, K., et al., Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins, 1999. S3: p. 171–176.

    Article  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press Inc

About this protocol

Cite this protocol

Xu, J., Jiao, F., Yu, L. (2008). Protein Structure Prediction Using Threading. In: Zaki, M.J., Bystroff, C. (eds) Protein Structure Prediction. Methods in Molecular Biology™, vol 413. Humana Press. https://doi.org/10.1007/978-1-59745-574-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-574-9_4

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-752-5

  • Online ISBN: 978-1-59745-574-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics