Protein Threading

Torda, Andrew E.

doi:10.1385/1-59259-890-0:921

Andrew E. Torda²

Part of the book series: Springer Protocols Handbooks ((SPH))

4073 Accesses
4 Citations

Abstract

Theoreticians have been trying to predict protein structure based on sequence information for decades. Literally, more than a quarter century ago, there were optimistic reports that one could use simulation methods to calculate the structure of a small protein given only its sequence (xc1|1,2). To this day, devotees of this approach persevere and may ultimately win over the problems with force fields and the enormous search space. In the meantime, a class of protein structure methods have developed, traveling under names such as “protein threading” and “fold recognition.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Levitt, M. (1975) Computer simulation of protein folding. Nature 253, 694–698.
Article PubMed CAS Google Scholar
Levitt, M. (1976) A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107.
Article PubMed CAS Google Scholar
Crippen, G. M. and Maiorov, V. N. (1995) How many protein-folding motifs are there? J. Mol. Biol. 252, 144–151.
Article PubMed CAS Google Scholar
Leonov, H., Mitchell, J. S. B., and Arkin, I. T. (2003) Monte Carlo estimation of the number of possible protein folds: effects of sampling bias and folds distributions. Proteins 51, 352–359.
Article PubMed CAS Google Scholar
Wolf, Y. I., Grishin, N. V., and Koonin, E. V. (2000) Estimating the number of protein folds and families from complete genome data. J. Mol. Biol. 299, 897–905.
Article PubMed CAS Google Scholar
Govindarajan, S., Recabarren, R., and Goldstein, R. K. (1999) Estimating the total number of protein folds. Proteins 35, 408–414.
Article PubMed CAS Google Scholar
Zhang, C. O. and DeLisi, C. (1998) Estimating the number of protein folds. J. Mol. Biol. 284, 1301–1305.
Article PubMed CAS Google Scholar
Wang, Z. X. (1998) A re-estimation for the total numbers of protein folds and superfamilies. Protein Eng. 11, 621–626.
Article PubMed CAS Google Scholar
Zhang, C. T. (1997) Relations of the numbers of protein sequences, families and folds. Protein Eng. 10, 757–761.
Article PubMed CAS Google Scholar
Wang, Z. X. (1996) How many fold types of protein are there in nature? Proteins 26, 186–191.
Article PubMed CAS Google Scholar
Orengo, C. A., Jones, D. T., and Thornton, J.M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.
Article PubMed CAS Google Scholar
Chothia, C. (1992) Proteins-1000 families for the molecular biologist. Nature 357, 543–544.
Article PubMed CAS Google Scholar
England, J. L., Shakhnovich, B. E., and Shakhnovich, E. I. (2003) Natural selection of more designable folds: a mechanism for thermophilic adaptation. Proc. Natl. Acad. Sci. USA 100, 8727–8731.
Article PubMed CAS Google Scholar
Li, H., Tang, C., and Wingreen, N. S. (2002) Designability of protein structures: a latticemodel study using the Miyazawa-Jernigan matrix. Proteins 49, 403–412.
Article PubMed CAS Google Scholar
Miller, J., Zeng, C., Wingreen, N. S., and Tang, C. (2002) Emergence of highly designable protein-backbone conformations in an off-lattice model. Proteins 47, 506–512.
Article PubMed CAS Google Scholar
Helling, R., Li, H., Melin, R., et al. (2001) The designability of protein structures. J. Mol. Graph. Mod. 19, 157–167.
Article CAS Google Scholar
Shahrezaei, V. and Ejtehadi, M. R. (2000) Geometry selects highly designable structures. J. Chem. Phys. 113, 6437–6442.
Article CAS Google Scholar
Bornberg-Bauer, E. (1997) How are model protein structures distributed in sequence space? Biophys. J. 73, 2393–2403.
Article PubMed CAS Google Scholar
Govindarajan, S. and Goldstein, R. A. (1996) Why are some protein structures so common? Proc. Natl. Acad. Sci. USA 93, 3341–3345.
Article PubMed CAS Google Scholar
Orengo, C. (1994) Classification of protein folds. Curr. Opin. Struct. Biol. 4, 429–440.
Article CAS Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Article PubMed CAS Google Scholar
Brenner, S. E., Chothia, C., and Hubbard, T. J. P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95, 6073–6078.
Article PubMed CAS Google Scholar
Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94.
Article PubMed CAS Google Scholar
Pearson, W. and Lipman, D. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Article PubMed CAS Google Scholar
Madej, T., Gibrat, J. F., and Bryant, S. H. (1995) Threading a database of protein cores. Proteins 23, 356–369.
Article PubMed CAS Google Scholar
Huber, T. and Torda, A. E. (2002) Protein structure prediction by threading: force field philosophy, approaches to alignment. In Tsigelny, I. F.] (ed.), Protein Structure Prediction: A Bioinformatic Approach, International University Line, La Jolla, pp. 263–
Google Scholar
Cornell, W. D., Cieplak, P., Bayly, C. I., et al. (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179–5197.
Article CAS Google Scholar
van Gunsteren, W. F., Billeter, S. R., Eising, A. A., et al. (1996) Biomolecular simulation: the GROMOS96 manual and user guide, vdf Hochschulverlag AG an der ETH Zurich and BIOMOS b.v., Zurich and Groningen.
Google Scholar
MacKerell, A. D., Bashford, D., Bellott, M., et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 102, 3586–3616.
Article CAS Google Scholar
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and Karplus, M. (1983) CHARMM-a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 187–217.
Article CAS Google Scholar
Chandler, D. (1987) Introduction to Modern Statistical Mechanics, Oxford University Press, New York.
Google Scholar
Miyazawa, S. and Jernigan, R. L. (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552.
Article CAS Google Scholar
Tanaka, S. and Scheraga, H. A. (1976) Statistical mechanical treatment of protein conformation. 1. Conformational properties of amino-acids in proteins. Macromolecules 9, 142–159.
Article PubMed CAS Google Scholar
Sippl, M. J. (1993) Boltzmann’s principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Comput. Aided Mol. Des. 7,473–501.
Article PubMed CAS Google Scholar
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–99.
Article PubMed CAS Google Scholar
Skolnick, J., Jaroszewski, L., Kolinski, A., and Godzik, A. (1997) Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci. 6, 676–688.
Article PubMed CAS Google Scholar
Ben-Naim, A. (1997) Statistical potentials extracted from protein structures: are these meaningful potentials? J. Chem. Phys. 107, 3698–3706.
Article CAS Google Scholar
Thomas, P. D. and Dill, K. (1996) Statistical potentials extracted from protein structures: how accurate are they? J. Mol. Biol. 257, 457–469.
Article PubMed CAS Google Scholar
Sippl, M. J. (1996) Helmholtz free energy of peptide hydrogen bonds in proteins. J. Mol. Biol. 260, 644–8.
Article PubMed CAS Google Scholar
Sippl, M. J., Ortner, M., Jaritz, M., Lackner, P., and Flockner, H. (1996) Helmholtz free energies of atom pair interactions in proteins. Fold. Des. 1, 289–98.
Article PubMed CAS Google Scholar
Shortle, D. (2003) Propensities, probabilities, and the Boltzmann hypothesis. Protein Sci. 12, 1298–1302.
Article PubMed CAS Google Scholar
Crippen, G. M. and Snow, M. E. (1990) A 1.8 angstrom resolution potential function for protein folding. Biopolymers 29, 1479–1489.
Article PubMed CAS Google Scholar
Crippen, G. M. (1996) Easily searched protein folding potentials. J. Mol. Biol. 260, 467–75.
Article PubMed CAS Google Scholar
Goldstein, R. A., Luthey-Schulten, Z. A., and Wolynes, P. G. (1992) Protein tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl. Acad. Sci. USA 89, 9029–9033.
Article PubMed CAS Google Scholar
Maiorov, V. N. and Crippen, G. M. (1992) Contact potential that recognizes the correct folding of globular-proteins. J. Mol. Biol. 227, 876–888.
Article PubMed CAS Google Scholar
Seetharamulu, P. and Crippen, G. M. (1991) A potential function for protein folding. J. Math. Chem. 6, 91–110.
Article CAS Google Scholar
Ulrich, P., Scott, W., van Gunsteren, W. F., and Torda, A. E. (1997) Protein structure prediction force fields-parametrization with quasi-Newtonian dynamics. Proteins 27, 367–384.
Article PubMed CAS Google Scholar
Huber, T. and Torda, A. E. (1998) Protein fold recognition without Boltzmann statistics or explicit physical basis. Protein Sci. 7, 142–149.
Article PubMed CAS Google Scholar
Hao, M. H. and Scheraga, H. A. (1996) How optimization of potential functions affects protein folding. Proc. Natl. Acad. Sci. USA 93, 4984–4989.
Article PubMed CAS Google Scholar
Mirny, L. A., and Shakhnovich, E. I. (1996) How to derive a protein folding potential-a new approach to an old problem. J. Mol. Biol. 264, 1164–1179.
Article PubMed CAS Google Scholar
Koretke, K. K., Luthey-Schulten, Z., and Wolynes, P. G. (1996) Self-consistently optimized statistical mechanical energy functions for sequence structure alignment. Protein Sci. 5, 1043–1059.
Article PubMed CAS Google Scholar
Lemer, C. M., Rooman, M. J., and Wodak, S. J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23, 337–355.
Article PubMed CAS Google Scholar
Chang, J., Carrillo, M. W., Waugh, A., Wei, L. P., and Altman, R. B. (2002) Scoring functions sensitive to alignment error have a more difficult search: a paradox for threading. In: (eiEaton, G. R., Wiley, D. C., and Jardetzky, O., eds.) Structures and Mechanisms: From Ashes to Enzymes, vol. 827. Oxford University Press, Oxford, UK: 309–320.
Chapter Google Scholar
Lathrop, R. H. (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7, 1059–1068.
Article PubMed CAS Google Scholar
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Article PubMed CAS Google Scholar
Needleman, S.B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.
Article PubMed CAS Google Scholar
Kirkpatrick, S., Gelatt, Jr., C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science 220, 671–680.
Article PubMed CAS Google Scholar
Bryant, S. H. and Lawrence, C. E. (1993) An empirical energy function for threading protein-sequence through the folding motif. Proteins 16, 92–112.
Article PubMed CAS Google Scholar
Wilmanns, M. and Eisenberg, D. (1995) Inverse protein folding by the residue pair preference profile method: estimating the correctness of alignments of structurally compatible sequences. Protein Eng. 8, 627–639.
Article PubMed CAS Google Scholar
Godzik, A., Kolinski, A., and Skolnick, J. (1992) Topology fingerprint approach to the inverse protein folding problem. J. Mol. Biol. 227, 227–238.
Article PubMed CAS Google Scholar
Taylor, W. R. (1997) Multiple sequence threading: an analysis of alignment quality and stability. J. Mol. Biol. 269, 902–943.
Article PubMed CAS Google Scholar
Huber, T. and Torda, A. E. (1999) Protein sequence threading, the alignment problem and a two step strategy. J. Comput. Chem. 20, 1455–1467.
Article CAS Google Scholar
Xu, Y. and Xu, D. (2000) Protein threading using prospect: design and evaluation. Proteins 40, 343–354.
Article PubMed CAS Google Scholar
Lathrop, R. H. (1999) An anytime local-to-global optimization algorithm for protein threading in o(m(²)n(²)) space. J. Comput. Biol. 6, 405–418.
Article PubMed CAS Google Scholar
Xu, Y. and Uberbacher, E. C. (1996) A polynomial-time algorithm for a class of protein threading problems. Comput. Appl. Biosci. 12, 511–517.
PubMed CAS Google Scholar
Lathrop, R. H. and Smith, T. F. (1996) Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665.
Article PubMed CAS Google Scholar
Xu, J. and Li, M. (2003) Assessment of RAPTOR’s linear programming approach in CAFASP3. Proteins 53, 579–584.
Article PubMed CAS Google Scholar
Crippen, G. M. (1996) Failures of inverse folding and threading with gapped alignment. Proteins 26, 167–171.
Article PubMed CAS Google Scholar
Park, B. H., Huang, E. S., and Levitt, M. (1997) Factors affecting the ability of energy functions to discriminate correct from incorrect folds. J. Mol. Biol. 266, 831–846.
Article PubMed CAS Google Scholar
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.
Article PubMed CAS Google Scholar
Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Methods Enzymol. 266, 460–480.
Article PubMed CAS Google Scholar
Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.
Article PubMed CAS Google Scholar
Pearson, W. R. (1998) Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.
Article PubMed CAS Google Scholar
Mott, R. (2000) Accurate formula for p-values of gapped local sequence and profile alignments. J. Mol. Biol. 300, 649–659.
Article PubMed CAS Google Scholar
Sommer, I., Zien, A., von Ohsen, N., Zimmer, R., and Lengauer, T. (2002) Confidence measures for protein fold recognition. Bioinformatics 18, 802–812.
Article PubMed CAS Google Scholar
Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.
Article PubMed CAS Google Scholar
Juan, D., Grana, O., Pazos, F., Fariselli, P., Casadio, R., and Valencia, A. (2003) A neural network approach to evaluate fold recognition results. Proteins 50, 600–608.
Article PubMed CAS Google Scholar
Xu, Y., Xu, D., and Olman, V. (2002) A practical method for interpretation of threading scores: an application of neural network. Stat. Sin. 12, 159–177.
Google Scholar
McGuffin, L. J. and Jones, D. T. (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19, 874–881.
Article PubMed CAS Google Scholar
Karplus, K., Sjolander, K., Barrett, C., et al. (1997) Predicting protein structure using hidden Markov models. Proteins, 134–139.
Google Scholar
Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.
Article PubMed CAS Google Scholar
Karplus, K., Barrett, C., Cline, M., Diekhans, M., Grate, L., and Hughey, R. (1999) Predicting protein structure using only sequence information. Proteins, 121–125.
Google Scholar
Panchenko, A. R., Marchler-Bauer, A., and Bryant, S. H. (2000) Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296, 1319–1331.
Article PubMed CAS Google Scholar
Russell, A. and Torda, A. E. (2002) Protein sequence threading-averaging over structures. Proteins 47, 496–505.
Article PubMed CAS Google Scholar
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.
Article PubMed CAS Google Scholar
Fischer, D. and Eisenberg, D. (1996) Protein fold recognition using sequence-derived predictions. Protein Sci. 5, 947–955.
Article PubMed CAS Google Scholar
Russell, R. B., Copley, R. R., and Barton, G. J. (1996) Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349–365.
Article PubMed CAS Google Scholar
Rost, B., Schneider, R., and Sander, C. (1997) Protein fold recognition by predictionbased threading. J. Mol. Biol. 270, 471–480.
Article PubMed CAS Google Scholar
Di Francesco, V., Munson, P. J., and Garnier, J. (1999) FORESST: fold recognition from secondary structure predictions of proteins. Bioinformatics 15, 131–140.
Article PubMed CAS Google Scholar
Ayers, D. J., Gooley, P. R., Widmer-Cooper, A., and Torda, A. E. (1999) Enhanced protein fold recognition using secondary structure information from NMR. Protein Sci. 8, 1127–1133.
Article PubMed CAS Google Scholar
Hargbo, J. and Elofsson, A. (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36, 68–76.
Article PubMed CAS Google Scholar
Ota, M., Kawabata, T., Kinjo, A. R., and Nishikawa, K. (1999) Cooperative approach for the protein fold recognition. Proteins, 126–132.
Google Scholar
Koretke, K. K., Russell, R. B., Copley, R. R., and Lupas, A. N. (1999) Fold recognition using sequence and secondary structure information. Proteins, 141–148.
Google Scholar
Rost, B. and Liu, J. F. (2003) The PredictProtein server. Nucleic Acids Res. 31, 3300–3304.
Article PubMed CAS Google Scholar
Eyrich, V. A. and Rost, B. (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res. 31, 3308–3310.
Article PubMed CAS Google Scholar
Koh, I. Y. Y., Eyrich, V. A., Marti-Renom, M. A., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 31, 3311–3315.
Article PubMed CAS Google Scholar
Kim, D., Xu, D., Guo, J. T., Ellrott, K., and Xu, Y. (2003) PROSPECT II: protein structure prediction program for genome-scale applications. Protein Eng. 16, 641–650.
Article PubMed CAS Google Scholar
Lu, L., Lu, H., and Skolnick, J. (2002) MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins 49, 350–364.
Article PubMed CAS Google Scholar
McGuffin, L. J. and Jones, D. T. (2002) Targeting novel folds for structural genomics. Proteins 48, 44–52.
Article PubMed CAS Google Scholar
Jones, D. T. (2001) Predicting novel protein folds by using FRAGFOLD. Proteins, 127–132.
Google Scholar
Skolnick, J., Kolinski, A., Kihara, D., Betancourt, M., Rotkiewicz, P., and Boniecki, M. (2001) Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement. Proteins, 149–156.
Google Scholar
Zhang, Y., Kolinski, A., and Skolnick, J. (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys. J. 85, 1145–1164.
Article PubMed CAS Google Scholar
Kihara, D., Lu, H., Kolinski, A., and Skolnick, J. (2001) TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc. Natl. Acad. Sci. USA 98, 10,125-10,130.
Article Google Scholar
Lu, L., Arakaki, A. K., Lu, H., and Skolnick, J. (2003) Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res. 13, 1146–1154.
Article PubMed CAS Google Scholar
Simons, K. T., Strauss, C., and Baker, D. (2001) Prospects for ab initio protein structural genomics. J. Mol. Biol. 306, 1191–1199.
Article PubMed CAS Google Scholar
Simons, K. T., Kooperberg, C., Huang, E., and Baker, D. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225.
Article PubMed CAS Google Scholar
Chivian, D., Robertson, T., Bonneau, R., and Baker, D. (2003) Ab initio methods. In: (eiBourne, P. E., and Weissig, H., eds.) Structural Bioinformatics vol. 44. Wiley-Liss, Hoboken, NJ: 547–548.
Google Scholar
Bonneau, R. and Baker, D. (2001) Ab initio protein structure prediction: progress and reports. Annu. Rev. Biophys. Biomol. Struct. 30, 173–189.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Zentrum fÜr Bioinformatik, University of Hamburg, Hamburg, Germany
Andrew E. Torda

Authors

Andrew E. Torda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hertfordshire, Hatfield, UK
John M. Walker

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Torda, A.E. (2005). Protein Threading. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-890-0:921

Download citation

DOI: https://doi.org/10.1385/1-59259-890-0:921
Publisher Name: Humana Press
Print ISBN: 978-1-58829-343-5
Online ISBN: 978-1-59259-890-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics