Protein Structure Prediction Based on Sequence Similarity

Jaroszewski, Lukasz

doi:10.1007/978-1-59745-524-4_7

Lukasz Jaroszewski²

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 569))

1396 Accesses
13 Citations

Summary

The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chothia, C. and Lesk, A.M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J, 5, 823–826.
PubMed CAS Google Scholar
Greer, J., Mollison, K.W., Carter, G.W. and Zuiderweg, E.R. (1989) Comparative modeling of proteins in the complement pathway. Prog Clin Biol Res, 289, 385–397.
PubMed CAS Google Scholar
Sander, C. and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins, 9, 56–68.
Article PubMed CAS Google Scholar
Swindells, M.B. and Thornton, J.M. (1991) Structure prediction and modelling. Curr Opin Biotechnol, 2, 512–519.
Article PubMed CAS Google Scholar
Xiang, Z. (2006) Advances in homology protein structure modeling. Curr Protein Pept Sci, 7, 217–227.
Article PubMed CAS Google Scholar
Ginalski, K. (2006) Comparative modeling for protein structure prediction. Curr Opin Struct Biol, 16, 172–177.
Article PubMed CAS Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 247, 536–540.
PubMed CAS Google Scholar
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L. et al. (2008) The Pfam protein families database. Nucleic Acids Res, 36, D281–D288.
Article PubMed CAS Google Scholar
Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res, 33, D212–D215.
Article PubMed CAS Google Scholar
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R. et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224–D228.
Article PubMed CAS Google Scholar
Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol, 313, 903–919.
Article PubMed CAS Google Scholar
Cheng, J., Sweredoski, M. and Baldi, P. (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Mining and Knowledge Discovery, 13, 1–10.
Article Google Scholar
Cheng, J. (2007) DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res, 35, W354–W356.
Article PubMed Google Scholar
Linding, R., Russell, R.B., Neduva, V. and Gibson, T.J. (2003) GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res, 31, 3701–3708.
Article PubMed CAS Google Scholar
Marsden, R.L., McGuffin, L.J. and Jones, D.T. (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci, 11, 2814–2824.
Article PubMed CAS Google Scholar
Liu, J. and Rost, B. (2004) CHOP: parsing proteins into structural domains. Nucleic Acids Res, 32, W569–W571.
Article PubMed CAS Google Scholar
Dunbrack, R.L., Jr. (2006) Sequence comparison and protein structure prediction. Curr Opin Struct Biol, 16, 374–384.
Article PubMed CAS Google Scholar
Holm, L., Ouzounis, C., Sander, C., Tuparev, G. and Vriend, G. (1992) A database of protein structure families with common folding motifs. Protein Sci, 1, 1691–1698.
Article PubMed CAS Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402.
Article PubMed CAS Google Scholar
Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763.
Article PubMed CAS Google Scholar
Rychlewski, L., Jaroszewski, L., Weizhong, L. and Godzik, A. (2000) Comparison of sequence profiles. Structural predictions with no structure information. Protein Sci, 8, 232–241.
Google Scholar
Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics, 21, 951–960.
Article PubMed Google Scholar
Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M. and Brenner, S.E. (2004) The ASTRAL Compendium in 2004. Nucleic Acids Res, 32, D189–D192.
Article PubMed CAS Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J Mol Biol, 215, 403–410.
PubMed CAS Google Scholar
Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 299, 499–520.
Article PubMed CAS Google Scholar
Shi, J., Blundell, T.L. and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 310, 243–257.
Article PubMed CAS Google Scholar
Fischer, D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput, 119–130.
Google Scholar
Xu, Y. and Xu, D. (2000) Protein threading using PROSPECT: design and evaluation. Proteins, 40, 343–354.
Article PubMed CAS Google Scholar
Karplus, K., Barrett, C. and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846–856.
Article PubMed CAS Google Scholar
Jaroszewski, L., Rychlewski, L. and Godzi, A. (2000) Improving the quality of twilight-zone alignments. Protein Sci, 9, 1487–1496.
Article PubMed CAS Google Scholar
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press: Cambridge.
Book Google Scholar
Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567–580.
Article PubMed CAS Google Scholar
Lupas, A., Van Dyke, M. and Stock, J. (1991) Predicting coiled coils from protein sequences. Science, 252, 1162–1164.
Article CAS Google Scholar
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol, 337, 635–645.
Article PubMed CAS Google Scholar
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 292, 195–202.
Article PubMed CAS Google Scholar
Wootton, J. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem, 17, 149–163.
Article CAS Google Scholar
Ginalski, K. and Rychlewski, L. (2003) Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucleic Acids Res, 31, 3291–3292.
Article PubMed CAS Google Scholar
Sanchez, R. and Sali, A. (1997) Advances in comparative protein-structure modelling. Curr Opin Struct Biol, 7, 206–214.
Article PubMed CAS Google Scholar
Wallner, B. and Elofsson, A. (2005) All are not equal: a benchmark of different homology modeling programs. Protein Sci, 14, 1315–1327.
Article PubMed CAS Google Scholar
Michalsky, E., Goede, A. and Preissner, R. (2003) Loops In Proteins (LIP) – a comprehensive loop database for homology modelling. Protein Eng, 16, 979–985.
Article PubMed CAS Google Scholar
Xiang, Z., Soto, C.S. and Honig, B. (2002) Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc Natl Acad Sci USA, 99, 7432–7437.
Article PubMed CAS Google Scholar
Sali, A. (1994) Modeller. A program for protein structure modelling by satisfaction of spatial restraints. http://quitar.rockefeller.edu/modeller/modeller.html.
Canutescu, A.A., Shelenkov, A.A. and Dunbrack, R.L., Jr. (2003) A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci, 12, 2001–2014.
Article PubMed CAS Google Scholar
Vriend, G. (1990) WHAT IF: a molecular modeling and drug design program. J Mol Graph, 8, 52–56, 29.
Article PubMed CAS Google Scholar
Schwede, T., Kopp, J., Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res, 31, 3381–3385.
Article PubMed CAS Google Scholar
Reeves, G.A., Dallman, T.J., Redfern, O.C., Akpor, A. and Orengo, C.A. (2006) Structural diversity of domain superfamilies in the CATH database. J Mol Biol, 360, 725–741.
Article PubMed CAS Google Scholar
Ye, Y. and Godzik, A. (2005) Multiple flexible structure alignment using partial order graphs. Bioinformatics, 21, 2362–2369.
Article PubMed CAS Google Scholar
Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164–170.
Article PubMed CAS Google Scholar
Sippl, M.J. (1993) Recognition of errors in three-dimensional structures of proteins. Proteins, 17, 355–362.
Article PubMed CAS Google Scholar
Morris, A.L., MacArthur, M.W., Hutchinson, E.G. and Thornton, J.M. (1992) Stereochemical quality of protein structure coordinates. Proteins, 12, 345–364.
Article PubMed CAS Google Scholar
Hooft, R.W., Vriend, G., Sander, C. and Abola, E.E. (1996) Errors in protein structures. Nature, 381, 272.
Article PubMed CAS Google Scholar
Melo, F., Devos, D., Depiereux, E. and Feytmans, E. (1997) ANOLEA: a www server to assess protein structures. Proc Int Conf Intell Syst Mol Biol, 5, 187–190.
PubMed CAS Google Scholar
Word, J.M., Lovell, S.C., LaBean, T.H., Taylor, H.C., Zalis, M.E., Presley, B.K., Richardson, J.S. and Richardson, D.C. (1999) Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol, 285, 1711–1733.
Article PubMed CAS Google Scholar
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. and Godzik, A. (2005) FFAS03: a server for profile – profile sequence alignments. Nucleic Acids Res, 33, W284–W288.
Article PubMed CAS Google Scholar
Wallner, B. and Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics, 21, 4248–4254.
Article PubMed CAS Google Scholar
Fischer, D. (2003) 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins, 51, 434–441.
Article PubMed CAS Google Scholar
Fischer, D., Rychlewski, L., Dunbrack, R.L., Jr., Ortiz, A.R. and Elofsson, A. (2003) CAFASP3: the third critical assessment of fully automated structure prediction methods. Proteins, 53(Suppl 6), 503–516.
Article PubMed CAS Google Scholar
Rychlewski, L. and Fischer, D. (2005) LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci, 14, 240–245.
Article PubMed CAS Google Scholar
Fischer, D. (2006) Servers for protein structure prediction. Curr Opin Struct Biol, 16, 178–182.
Article PubMed CAS Google Scholar
Rost, B., Yachdav, G. and Liu, J. (2004) The PredictProtein server. Nucleic Acids Res, 32, W321–W326.
Article PubMed CAS Google Scholar
McGuffin, L.J., Bryson, K. and Jones, D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404–405.
Article PubMed CAS Google Scholar
Pieper, U., Eswar, N., Davis, F.P., Braberg, H., Madhusudhan, M.S., Rossi, A., Marti-Renom, M., Karchin, R., Webb, B.M., Eramian, D. et al. (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 34, D291–D295.
Article PubMed CAS Google Scholar
Eswar, N., John, B., Mirkovic, N., Fiser, A., Ilyin, V.A., Pieper, U., Stuart, A.C., Marti-Renom, M.A., Madhusudhan, M.S., Yerkovich, B. et al. (2003) Tools for comparative protein structure modeling and analysis. Nucleic Acids Res, 31, 3375–3380.
Article PubMed CAS Google Scholar
Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2001) Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins, Suppl 5, 39–46.
Google Scholar
Slabinski, L., Jaroszewski, L., Rychlewski, L., Wilson, I.A., Lesley, S.A. and Godzik, A. (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics, 23, 3403–3405.
Article PubMed CAS Google Scholar
Fernandez-Fuentes, N., Rai, B.K., Madrid-Aliste, C.J., Fajardo, J.E. and Fiser, A. (2007) Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics, 23, 2558–2565.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

The Burnham Institute, La Jolla, CA, USA
Lukasz Jaroszewski

Authors

Lukasz Jaroszewski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biomedical Informatics Research Network, University of California, San Diego, Gilman Dr. 9500, La Jolla, 92093, U.S.A.
Vadim Astakhov

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Jaroszewski, L. (2009). Protein Structure Prediction Based on Sequence Similarity. In: Astakhov, V. (eds) Biomedical Informatics. Methods in Molecular Biology™, vol 569. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59745-524-4_7

Download citation

DOI: https://doi.org/10.1007/978-1-59745-524-4_7
Published: 29 June 2009
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-934115-63-3
Online ISBN: 978-1-59745-524-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics