Summary
The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chothia, C. and Lesk, A.M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J, 5, 823–826.
Greer, J., Mollison, K.W., Carter, G.W. and Zuiderweg, E.R. (1989) Comparative modeling of proteins in the complement pathway. Prog Clin Biol Res, 289, 385–397.
Sander, C. and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins, 9, 56–68.
Swindells, M.B. and Thornton, J.M. (1991) Structure prediction and modelling. Curr Opin Biotechnol, 2, 512–519.
Xiang, Z. (2006) Advances in homology protein structure modeling. Curr Protein Pept Sci, 7, 217–227.
Ginalski, K. (2006) Comparative modeling for protein structure prediction. Curr Opin Struct Biol, 16, 172–177.
Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 247, 536–540.
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L. et al. (2008) The Pfam protein families database. Nucleic Acids Res, 36, D281–D288.
Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res, 33, D212–D215.
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R. et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224–D228.
Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol, 313, 903–919.
Cheng, J., Sweredoski, M. and Baldi, P. (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Mining and Knowledge Discovery, 13, 1–10.
Cheng, J. (2007) DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res, 35, W354–W356.
Linding, R., Russell, R.B., Neduva, V. and Gibson, T.J. (2003) GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res, 31, 3701–3708.
Marsden, R.L., McGuffin, L.J. and Jones, D.T. (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci, 11, 2814–2824.
Liu, J. and Rost, B. (2004) CHOP: parsing proteins into structural domains. Nucleic Acids Res, 32, W569–W571.
Dunbrack, R.L., Jr. (2006) Sequence comparison and protein structure prediction. Curr Opin Struct Biol, 16, 374–384.
Holm, L., Ouzounis, C., Sander, C., Tuparev, G. and Vriend, G. (1992) A database of protein structure families with common folding motifs. Protein Sci, 1, 1691–1698.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402.
Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763.
Rychlewski, L., Jaroszewski, L., Weizhong, L. and Godzik, A. (2000) Comparison of sequence profiles. Structural predictions with no structure information. Protein Sci, 8, 232–241.
Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics, 21, 951–960.
Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M. and Brenner, S.E. (2004) The ASTRAL Compendium in 2004. Nucleic Acids Res, 32, D189–D192.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J Mol Biol, 215, 403–410.
Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 299, 499–520.
Shi, J., Blundell, T.L. and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 310, 243–257.
Fischer, D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput, 119–130.
Xu, Y. and Xu, D. (2000) Protein threading using PROSPECT: design and evaluation. Proteins, 40, 343–354.
Karplus, K., Barrett, C. and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846–856.
Jaroszewski, L., Rychlewski, L. and Godzi, A. (2000) Improving the quality of twilight-zone alignments. Protein Sci, 9, 1487–1496.
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press: Cambridge.
Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567–580.
Lupas, A., Van Dyke, M. and Stock, J. (1991) Predicting coiled coils from protein sequences. Science, 252, 1162–1164.
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol, 337, 635–645.
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 292, 195–202.
Wootton, J. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem, 17, 149–163.
Ginalski, K. and Rychlewski, L. (2003) Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucleic Acids Res, 31, 3291–3292.
Sanchez, R. and Sali, A. (1997) Advances in comparative protein-structure modelling. Curr Opin Struct Biol, 7, 206–214.
Wallner, B. and Elofsson, A. (2005) All are not equal: a benchmark of different homology modeling programs. Protein Sci, 14, 1315–1327.
Michalsky, E., Goede, A. and Preissner, R. (2003) Loops In Proteins (LIP) – a comprehensive loop database for homology modelling. Protein Eng, 16, 979–985.
Xiang, Z., Soto, C.S. and Honig, B. (2002) Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc Natl Acad Sci USA, 99, 7432–7437.
Sali, A. (1994) Modeller. A program for protein structure modelling by satisfaction of spatial restraints. http://quitar.rockefeller.edu/modeller/modeller.html.
Canutescu, A.A., Shelenkov, A.A. and Dunbrack, R.L., Jr. (2003) A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci, 12, 2001–2014.
Vriend, G. (1990) WHAT IF: a molecular modeling and drug design program. J Mol Graph, 8, 52–56, 29.
Schwede, T., Kopp, J., Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res, 31, 3381–3385.
Reeves, G.A., Dallman, T.J., Redfern, O.C., Akpor, A. and Orengo, C.A. (2006) Structural diversity of domain superfamilies in the CATH database. J Mol Biol, 360, 725–741.
Ye, Y. and Godzik, A. (2005) Multiple flexible structure alignment using partial order graphs. Bioinformatics, 21, 2362–2369.
Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164–170.
Sippl, M.J. (1993) Recognition of errors in three-dimensional structures of proteins. Proteins, 17, 355–362.
Morris, A.L., MacArthur, M.W., Hutchinson, E.G. and Thornton, J.M. (1992) Stereochemical quality of protein structure coordinates. Proteins, 12, 345–364.
Hooft, R.W., Vriend, G., Sander, C. and Abola, E.E. (1996) Errors in protein structures. Nature, 381, 272.
Melo, F., Devos, D., Depiereux, E. and Feytmans, E. (1997) ANOLEA: a www server to assess protein structures. Proc Int Conf Intell Syst Mol Biol, 5, 187–190.
Word, J.M., Lovell, S.C., LaBean, T.H., Taylor, H.C., Zalis, M.E., Presley, B.K., Richardson, J.S. and Richardson, D.C. (1999) Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol, 285, 1711–1733.
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. and Godzik, A. (2005) FFAS03: a server for profile – profile sequence alignments. Nucleic Acids Res, 33, W284–W288.
Wallner, B. and Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics, 21, 4248–4254.
Fischer, D. (2003) 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins, 51, 434–441.
Fischer, D., Rychlewski, L., Dunbrack, R.L., Jr., Ortiz, A.R. and Elofsson, A. (2003) CAFASP3: the third critical assessment of fully automated structure prediction methods. Proteins, 53(Suppl 6), 503–516.
Rychlewski, L. and Fischer, D. (2005) LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci, 14, 240–245.
Fischer, D. (2006) Servers for protein structure prediction. Curr Opin Struct Biol, 16, 178–182.
Rost, B., Yachdav, G. and Liu, J. (2004) The PredictProtein server. Nucleic Acids Res, 32, W321–W326.
McGuffin, L.J., Bryson, K. and Jones, D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404–405.
Pieper, U., Eswar, N., Davis, F.P., Braberg, H., Madhusudhan, M.S., Rossi, A., Marti-Renom, M., Karchin, R., Webb, B.M., Eramian, D. et al. (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 34, D291–D295.
Eswar, N., John, B., Mirkovic, N., Fiser, A., Ilyin, V.A., Pieper, U., Stuart, A.C., Marti-Renom, M.A., Madhusudhan, M.S., Yerkovich, B. et al. (2003) Tools for comparative protein structure modeling and analysis. Nucleic Acids Res, 31, 3375–3380.
Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2001) Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins, Suppl 5, 39–46.
Slabinski, L., Jaroszewski, L., Rychlewski, L., Wilson, I.A., Lesley, S.A. and Godzik, A. (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics, 23, 3403–3405.
Fernandez-Fuentes, N., Rai, B.K., Madrid-Aliste, C.J., Fajardo, J.E. and Fiser, A. (2007) Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics, 23, 2558–2565.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Jaroszewski, L. (2009). Protein Structure Prediction Based on Sequence Similarity. In: Astakhov, V. (eds) Biomedical Informatics. Methods in Molecular Biology™, vol 569. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59745-524-4_7
Download citation
DOI: https://doi.org/10.1007/978-1-59745-524-4_7
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-934115-63-3
Online ISBN: 978-1-59745-524-4
eBook Packages: Springer Protocols