Abstract
Theoreticians have been trying to predict protein structure based on sequence information for decades. Literally, more than a quarter century ago, there were optimistic reports that one could use simulation methods to calculate the structure of a small protein given only its sequence (xc1|1,2). To this day, devotees of this approach persevere and may ultimately win over the problems with force fields and the enormous search space. In the meantime, a class of protein structure methods have developed, traveling under names such as “protein threading” and “fold recognition.”
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Levitt, M. (1975) Computer simulation of protein folding. Nature 253, 694–698.
Levitt, M. (1976) A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107.
Crippen, G. M. and Maiorov, V. N. (1995) How many protein-folding motifs are there? J. Mol. Biol. 252, 144–151.
Leonov, H., Mitchell, J. S. B., and Arkin, I. T. (2003) Monte Carlo estimation of the number of possible protein folds: effects of sampling bias and folds distributions. Proteins 51, 352–359.
Wolf, Y. I., Grishin, N. V., and Koonin, E. V. (2000) Estimating the number of protein folds and families from complete genome data. J. Mol. Biol. 299, 897–905.
Govindarajan, S., Recabarren, R., and Goldstein, R. K. (1999) Estimating the total number of protein folds. Proteins 35, 408–414.
Zhang, C. O. and DeLisi, C. (1998) Estimating the number of protein folds. J. Mol. Biol. 284, 1301–1305.
Wang, Z. X. (1998) A re-estimation for the total numbers of protein folds and superfamilies. Protein Eng. 11, 621–626.
Zhang, C. T. (1997) Relations of the numbers of protein sequences, families and folds. Protein Eng. 10, 757–761.
Wang, Z. X. (1996) How many fold types of protein are there in nature? Proteins 26, 186–191.
Orengo, C. A., Jones, D. T., and Thornton, J.M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.
Chothia, C. (1992) Proteins-1000 families for the molecular biologist. Nature 357, 543–544.
England, J. L., Shakhnovich, B. E., and Shakhnovich, E. I. (2003) Natural selection of more designable folds: a mechanism for thermophilic adaptation. Proc. Natl. Acad. Sci. USA 100, 8727–8731.
Li, H., Tang, C., and Wingreen, N. S. (2002) Designability of protein structures: a latticemodel study using the Miyazawa-Jernigan matrix. Proteins 49, 403–412.
Miller, J., Zeng, C., Wingreen, N. S., and Tang, C. (2002) Emergence of highly designable protein-backbone conformations in an off-lattice model. Proteins 47, 506–512.
Helling, R., Li, H., Melin, R., et al. (2001) The designability of protein structures. J. Mol. Graph. Mod. 19, 157–167.
Shahrezaei, V. and Ejtehadi, M. R. (2000) Geometry selects highly designable structures. J. Chem. Phys. 113, 6437–6442.
Bornberg-Bauer, E. (1997) How are model protein structures distributed in sequence space? Biophys. J. 73, 2393–2403.
Govindarajan, S. and Goldstein, R. A. (1996) Why are some protein structures so common? Proc. Natl. Acad. Sci. USA 93, 3341–3345.
Orengo, C. (1994) Classification of protein folds. Curr. Opin. Struct. Biol. 4, 429–440.
Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Brenner, S. E., Chothia, C., and Hubbard, T. J. P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95, 6073–6078.
Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94.
Pearson, W. and Lipman, D. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Madej, T., Gibrat, J. F., and Bryant, S. H. (1995) Threading a database of protein cores. Proteins 23, 356–369.
Huber, T. and Torda, A. E. (2002) Protein structure prediction by threading: force field philosophy, approaches to alignment. In Tsigelny, I. F.] (ed.), Protein Structure Prediction: A Bioinformatic Approach, International University Line, La Jolla, pp. 263–
Cornell, W. D., Cieplak, P., Bayly, C. I., et al. (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179–5197.
van Gunsteren, W. F., Billeter, S. R., Eising, A. A., et al. (1996) Biomolecular simulation: the GROMOS96 manual and user guide, vdf Hochschulverlag AG an der ETH Zurich and BIOMOS b.v., Zurich and Groningen.
MacKerell, A. D., Bashford, D., Bellott, M., et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 102, 3586–3616.
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and Karplus, M. (1983) CHARMM-a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 187–217.
Chandler, D. (1987) Introduction to Modern Statistical Mechanics, Oxford University Press, New York.
Miyazawa, S. and Jernigan, R. L. (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552.
Tanaka, S. and Scheraga, H. A. (1976) Statistical mechanical treatment of protein conformation. 1. Conformational properties of amino-acids in proteins. Macromolecules 9, 142–159.
Sippl, M. J. (1993) Boltzmann’s principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Comput. Aided Mol. Des. 7,473–501.
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–99.
Skolnick, J., Jaroszewski, L., Kolinski, A., and Godzik, A. (1997) Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci. 6, 676–688.
Ben-Naim, A. (1997) Statistical potentials extracted from protein structures: are these meaningful potentials? J. Chem. Phys. 107, 3698–3706.
Thomas, P. D. and Dill, K. (1996) Statistical potentials extracted from protein structures: how accurate are they? J. Mol. Biol. 257, 457–469.
Sippl, M. J. (1996) Helmholtz free energy of peptide hydrogen bonds in proteins. J. Mol. Biol. 260, 644–8.
Sippl, M. J., Ortner, M., Jaritz, M., Lackner, P., and Flockner, H. (1996) Helmholtz free energies of atom pair interactions in proteins. Fold. Des. 1, 289–98.
Shortle, D. (2003) Propensities, probabilities, and the Boltzmann hypothesis. Protein Sci. 12, 1298–1302.
Crippen, G. M. and Snow, M. E. (1990) A 1.8 angstrom resolution potential function for protein folding. Biopolymers 29, 1479–1489.
Crippen, G. M. (1996) Easily searched protein folding potentials. J. Mol. Biol. 260, 467–75.
Goldstein, R. A., Luthey-Schulten, Z. A., and Wolynes, P. G. (1992) Protein tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl. Acad. Sci. USA 89, 9029–9033.
Maiorov, V. N. and Crippen, G. M. (1992) Contact potential that recognizes the correct folding of globular-proteins. J. Mol. Biol. 227, 876–888.
Seetharamulu, P. and Crippen, G. M. (1991) A potential function for protein folding. J. Math. Chem. 6, 91–110.
Ulrich, P., Scott, W., van Gunsteren, W. F., and Torda, A. E. (1997) Protein structure prediction force fields-parametrization with quasi-Newtonian dynamics. Proteins 27, 367–384.
Huber, T. and Torda, A. E. (1998) Protein fold recognition without Boltzmann statistics or explicit physical basis. Protein Sci. 7, 142–149.
Hao, M. H. and Scheraga, H. A. (1996) How optimization of potential functions affects protein folding. Proc. Natl. Acad. Sci. USA 93, 4984–4989.
Mirny, L. A., and Shakhnovich, E. I. (1996) How to derive a protein folding potential-a new approach to an old problem. J. Mol. Biol. 264, 1164–1179.
Koretke, K. K., Luthey-Schulten, Z., and Wolynes, P. G. (1996) Self-consistently optimized statistical mechanical energy functions for sequence structure alignment. Protein Sci. 5, 1043–1059.
Lemer, C. M., Rooman, M. J., and Wodak, S. J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23, 337–355.
Chang, J., Carrillo, M. W., Waugh, A., Wei, L. P., and Altman, R. B. (2002) Scoring functions sensitive to alignment error have a more difficult search: a paradox for threading. In: (eiEaton, G. R., Wiley, D. C., and Jardetzky, O., eds.) Structures and Mechanisms: From Ashes to Enzymes, vol. 827. Oxford University Press, Oxford, UK: 309–320.
Lathrop, R. H. (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7, 1059–1068.
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Needleman, S.B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.
Kirkpatrick, S., Gelatt, Jr., C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science 220, 671–680.
Bryant, S. H. and Lawrence, C. E. (1993) An empirical energy function for threading protein-sequence through the folding motif. Proteins 16, 92–112.
Wilmanns, M. and Eisenberg, D. (1995) Inverse protein folding by the residue pair preference profile method: estimating the correctness of alignments of structurally compatible sequences. Protein Eng. 8, 627–639.
Godzik, A., Kolinski, A., and Skolnick, J. (1992) Topology fingerprint approach to the inverse protein folding problem. J. Mol. Biol. 227, 227–238.
Taylor, W. R. (1997) Multiple sequence threading: an analysis of alignment quality and stability. J. Mol. Biol. 269, 902–943.
Huber, T. and Torda, A. E. (1999) Protein sequence threading, the alignment problem and a two step strategy. J. Comput. Chem. 20, 1455–1467.
Xu, Y. and Xu, D. (2000) Protein threading using prospect: design and evaluation. Proteins 40, 343–354.
Lathrop, R. H. (1999) An anytime local-to-global optimization algorithm for protein threading in o(m(2)n(2)) space. J. Comput. Biol. 6, 405–418.
Xu, Y. and Uberbacher, E. C. (1996) A polynomial-time algorithm for a class of protein threading problems. Comput. Appl. Biosci. 12, 511–517.
Lathrop, R. H. and Smith, T. F. (1996) Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665.
Xu, J. and Li, M. (2003) Assessment of RAPTOR’s linear programming approach in CAFASP3. Proteins 53, 579–584.
Crippen, G. M. (1996) Failures of inverse folding and threading with gapped alignment. Proteins 26, 167–171.
Park, B. H., Huang, E. S., and Levitt, M. (1997) Factors affecting the ability of energy functions to discriminate correct from incorrect folds. J. Mol. Biol. 266, 831–846.
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.
Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Methods Enzymol. 266, 460–480.
Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.
Pearson, W. R. (1998) Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.
Mott, R. (2000) Accurate formula for p-values of gapped local sequence and profile alignments. J. Mol. Biol. 300, 649–659.
Sommer, I., Zien, A., von Ohsen, N., Zimmer, R., and Lengauer, T. (2002) Confidence measures for protein fold recognition. Bioinformatics 18, 802–812.
Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.
Juan, D., Grana, O., Pazos, F., Fariselli, P., Casadio, R., and Valencia, A. (2003) A neural network approach to evaluate fold recognition results. Proteins 50, 600–608.
Xu, Y., Xu, D., and Olman, V. (2002) A practical method for interpretation of threading scores: an application of neural network. Stat. Sin. 12, 159–177.
McGuffin, L. J. and Jones, D. T. (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19, 874–881.
Karplus, K., Sjolander, K., Barrett, C., et al. (1997) Predicting protein structure using hidden Markov models. Proteins, 134–139.
Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.
Karplus, K., Barrett, C., Cline, M., Diekhans, M., Grate, L., and Hughey, R. (1999) Predicting protein structure using only sequence information. Proteins, 121–125.
Panchenko, A. R., Marchler-Bauer, A., and Bryant, S. H. (2000) Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296, 1319–1331.
Russell, A. and Torda, A. E. (2002) Protein sequence threading-averaging over structures. Proteins 47, 496–505.
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.
Fischer, D. and Eisenberg, D. (1996) Protein fold recognition using sequence-derived predictions. Protein Sci. 5, 947–955.
Russell, R. B., Copley, R. R., and Barton, G. J. (1996) Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349–365.
Rost, B., Schneider, R., and Sander, C. (1997) Protein fold recognition by predictionbased threading. J. Mol. Biol. 270, 471–480.
Di Francesco, V., Munson, P. J., and Garnier, J. (1999) FORESST: fold recognition from secondary structure predictions of proteins. Bioinformatics 15, 131–140.
Ayers, D. J., Gooley, P. R., Widmer-Cooper, A., and Torda, A. E. (1999) Enhanced protein fold recognition using secondary structure information from NMR. Protein Sci. 8, 1127–1133.
Hargbo, J. and Elofsson, A. (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36, 68–76.
Ota, M., Kawabata, T., Kinjo, A. R., and Nishikawa, K. (1999) Cooperative approach for the protein fold recognition. Proteins, 126–132.
Koretke, K. K., Russell, R. B., Copley, R. R., and Lupas, A. N. (1999) Fold recognition using sequence and secondary structure information. Proteins, 141–148.
Rost, B. and Liu, J. F. (2003) The PredictProtein server. Nucleic Acids Res. 31, 3300–3304.
Eyrich, V. A. and Rost, B. (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res. 31, 3308–3310.
Koh, I. Y. Y., Eyrich, V. A., Marti-Renom, M. A., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 31, 3311–3315.
Kim, D., Xu, D., Guo, J. T., Ellrott, K., and Xu, Y. (2003) PROSPECT II: protein structure prediction program for genome-scale applications. Protein Eng. 16, 641–650.
Lu, L., Lu, H., and Skolnick, J. (2002) MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins 49, 350–364.
McGuffin, L. J. and Jones, D. T. (2002) Targeting novel folds for structural genomics. Proteins 48, 44–52.
Jones, D. T. (2001) Predicting novel protein folds by using FRAGFOLD. Proteins, 127–132.
Skolnick, J., Kolinski, A., Kihara, D., Betancourt, M., Rotkiewicz, P., and Boniecki, M. (2001) Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement. Proteins, 149–156.
Zhang, Y., Kolinski, A., and Skolnick, J. (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys. J. 85, 1145–1164.
Kihara, D., Lu, H., Kolinski, A., and Skolnick, J. (2001) TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc. Natl. Acad. Sci. USA 98, 10,125-10,130.
Lu, L., Arakaki, A. K., Lu, H., and Skolnick, J. (2003) Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res. 13, 1146–1154.
Simons, K. T., Strauss, C., and Baker, D. (2001) Prospects for ab initio protein structural genomics. J. Mol. Biol. 306, 1191–1199.
Simons, K. T., Kooperberg, C., Huang, E., and Baker, D. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225.
Chivian, D., Robertson, T., Bonneau, R., and Baker, D. (2003) Ab initio methods. In: (eiBourne, P. E., and Weissig, H., eds.) Structural Bioinformatics vol. 44. Wiley-Liss, Hoboken, NJ: 547–548.
Bonneau, R. and Baker, D. (2001) Ab initio protein structure prediction: progress and reports. Annu. Rev. Biophys. Biomol. Struct. 30, 173–189.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Torda, A.E. (2005). Protein Threading. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-890-0:921
Download citation
DOI: https://doi.org/10.1385/1-59259-890-0:921
Publisher Name: Humana Press
Print ISBN: 978-1-58829-343-5
Online ISBN: 978-1-59259-890-8
eBook Packages: Springer Protocols