Skip to main content

Protein Structure Prediction Based on Sequence Similarity

  • Protocol
  • First Online:
Biomedical Informatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 569))

Summary

The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chothia, C. and Lesk, A.M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J, 5, 823–826.

    PubMed  CAS  Google Scholar 

  2. Greer, J., Mollison, K.W., Carter, G.W. and Zuiderweg, E.R. (1989) Comparative modeling of proteins in the complement pathway. Prog Clin Biol Res, 289, 385–397.

    PubMed  CAS  Google Scholar 

  3. Sander, C. and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins, 9, 56–68.

    Article  PubMed  CAS  Google Scholar 

  4. Swindells, M.B. and Thornton, J.M. (1991) Structure prediction and modelling. Curr Opin Biotechnol, 2, 512–519.

    Article  PubMed  CAS  Google Scholar 

  5. Xiang, Z. (2006) Advances in homology protein structure modeling. Curr Protein Pept Sci, 7, 217–227.

    Article  PubMed  CAS  Google Scholar 

  6. Ginalski, K. (2006) Comparative modeling for protein structure prediction. Curr Opin Struct Biol, 16, 172–177.

    Article  PubMed  CAS  Google Scholar 

  7. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 247, 536–540.

    PubMed  CAS  Google Scholar 

  8. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L. et al. (2008) The Pfam protein families database. Nucleic Acids Res, 36, D281–D288.

    Article  PubMed  CAS  Google Scholar 

  9. Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res, 33, D212–D215.

    Article  PubMed  CAS  Google Scholar 

  10. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R. et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224–D228.

    Article  PubMed  CAS  Google Scholar 

  11. Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol, 313, 903–919.

    Article  PubMed  CAS  Google Scholar 

  12. Cheng, J., Sweredoski, M. and Baldi, P. (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Mining and Knowledge Discovery, 13, 1–10.

    Article  Google Scholar 

  13. Cheng, J. (2007) DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res, 35, W354–W356.

    Article  PubMed  Google Scholar 

  14. Linding, R., Russell, R.B., Neduva, V. and Gibson, T.J. (2003) GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res, 31, 3701–3708.

    Article  PubMed  CAS  Google Scholar 

  15. Marsden, R.L., McGuffin, L.J. and Jones, D.T. (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci, 11, 2814–2824.

    Article  PubMed  CAS  Google Scholar 

  16. Liu, J. and Rost, B. (2004) CHOP: parsing proteins into structural domains. Nucleic Acids Res, 32, W569–W571.

    Article  PubMed  CAS  Google Scholar 

  17. Dunbrack, R.L., Jr. (2006) Sequence comparison and protein structure prediction. Curr Opin Struct Biol, 16, 374–384.

    Article  PubMed  CAS  Google Scholar 

  18. Holm, L., Ouzounis, C., Sander, C., Tuparev, G. and Vriend, G. (1992) A database of protein structure families with common folding motifs. Protein Sci, 1, 1691–1698.

    Article  PubMed  CAS  Google Scholar 

  19. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  20. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763.

    Article  PubMed  CAS  Google Scholar 

  21. Rychlewski, L., Jaroszewski, L., Weizhong, L. and Godzik, A. (2000) Comparison of sequence profiles. Structural predictions with no structure information. Protein Sci, 8, 232–241.

    Google Scholar 

  22. Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics, 21, 951–960.

    Article  PubMed  Google Scholar 

  23. Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M. and Brenner, S.E. (2004) The ASTRAL Compendium in 2004. Nucleic Acids Res, 32, D189–D192.

    Article  PubMed  CAS  Google Scholar 

  24. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J Mol Biol, 215, 403–410.

    PubMed  CAS  Google Scholar 

  25. Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 299, 499–520.

    Article  PubMed  CAS  Google Scholar 

  26. Shi, J., Blundell, T.L. and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 310, 243–257.

    Article  PubMed  CAS  Google Scholar 

  27. Fischer, D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput, 119–130.

    Google Scholar 

  28. Xu, Y. and Xu, D. (2000) Protein threading using PROSPECT: design and evaluation. Proteins, 40, 343–354.

    Article  PubMed  CAS  Google Scholar 

  29. Karplus, K., Barrett, C. and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846–856.

    Article  PubMed  CAS  Google Scholar 

  30. Jaroszewski, L., Rychlewski, L. and Godzi, A. (2000) Improving the quality of twilight-zone alignments. Protein Sci, 9, 1487–1496.

    Article  PubMed  CAS  Google Scholar 

  31. Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press: Cambridge.

    Book  Google Scholar 

  32. Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567–580.

    Article  PubMed  CAS  Google Scholar 

  33. Lupas, A., Van Dyke, M. and Stock, J. (1991) Predicting coiled coils from protein sequences. Science, 252, 1162–1164.

    Article  CAS  Google Scholar 

  34. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol, 337, 635–645.

    Article  PubMed  CAS  Google Scholar 

  35. Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 292, 195–202.

    Article  PubMed  CAS  Google Scholar 

  36. Wootton, J. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem, 17, 149–163.

    Article  CAS  Google Scholar 

  37. Ginalski, K. and Rychlewski, L. (2003) Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucleic Acids Res, 31, 3291–3292.

    Article  PubMed  CAS  Google Scholar 

  38. Sanchez, R. and Sali, A. (1997) Advances in comparative protein-structure modelling. Curr Opin Struct Biol, 7, 206–214.

    Article  PubMed  CAS  Google Scholar 

  39. Wallner, B. and Elofsson, A. (2005) All are not equal: a benchmark of different homology modeling programs. Protein Sci, 14, 1315–1327.

    Article  PubMed  CAS  Google Scholar 

  40. Michalsky, E., Goede, A. and Preissner, R. (2003) Loops In Proteins (LIP) – a comprehensive loop database for homology modelling. Protein Eng, 16, 979–985.

    Article  PubMed  CAS  Google Scholar 

  41. Xiang, Z., Soto, C.S. and Honig, B. (2002) Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc Natl Acad Sci USA, 99, 7432–7437.

    Article  PubMed  CAS  Google Scholar 

  42. Sali, A. (1994) Modeller. A program for protein structure modelling by satisfaction of spatial restraints. http://quitar.rockefeller.edu/modeller/modeller.html.

  43. Canutescu, A.A., Shelenkov, A.A. and Dunbrack, R.L., Jr. (2003) A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci, 12, 2001–2014.

    Article  PubMed  CAS  Google Scholar 

  44. Vriend, G. (1990) WHAT IF: a molecular modeling and drug design program. J Mol Graph, 8, 52–56, 29.

    Article  PubMed  CAS  Google Scholar 

  45. Schwede, T., Kopp, J., Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res, 31, 3381–3385.

    Article  PubMed  CAS  Google Scholar 

  46. Reeves, G.A., Dallman, T.J., Redfern, O.C., Akpor, A. and Orengo, C.A. (2006) Structural diversity of domain superfamilies in the CATH database. J Mol Biol, 360, 725–741.

    Article  PubMed  CAS  Google Scholar 

  47. Ye, Y. and Godzik, A. (2005) Multiple flexible structure alignment using partial order graphs. Bioinformatics, 21, 2362–2369.

    Article  PubMed  CAS  Google Scholar 

  48. Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164–170.

    Article  PubMed  CAS  Google Scholar 

  49. Sippl, M.J. (1993) Recognition of errors in three-dimensional structures of proteins. Proteins, 17, 355–362.

    Article  PubMed  CAS  Google Scholar 

  50. Morris, A.L., MacArthur, M.W., Hutchinson, E.G. and Thornton, J.M. (1992) Stereochemical quality of protein structure coordinates. Proteins, 12, 345–364.

    Article  PubMed  CAS  Google Scholar 

  51. Hooft, R.W., Vriend, G., Sander, C. and Abola, E.E. (1996) Errors in protein structures. Nature, 381, 272.

    Article  PubMed  CAS  Google Scholar 

  52. Melo, F., Devos, D., Depiereux, E. and Feytmans, E. (1997) ANOLEA: a www server to assess protein structures. Proc Int Conf Intell Syst Mol Biol, 5, 187–190.

    PubMed  CAS  Google Scholar 

  53. Word, J.M., Lovell, S.C., LaBean, T.H., Taylor, H.C., Zalis, M.E., Presley, B.K., Richardson, J.S. and Richardson, D.C. (1999) Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol, 285, 1711–1733.

    Article  PubMed  CAS  Google Scholar 

  54. Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. and Godzik, A. (2005) FFAS03: a server for profile – profile sequence alignments. Nucleic Acids Res, 33, W284–W288.

    Article  PubMed  CAS  Google Scholar 

  55. Wallner, B. and Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics, 21, 4248–4254.

    Article  PubMed  CAS  Google Scholar 

  56. Fischer, D. (2003) 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins, 51, 434–441.

    Article  PubMed  CAS  Google Scholar 

  57. Fischer, D., Rychlewski, L., Dunbrack, R.L., Jr., Ortiz, A.R. and Elofsson, A. (2003) CAFASP3: the third critical assessment of fully automated structure prediction methods. Proteins, 53(Suppl 6), 503–516.

    Article  PubMed  CAS  Google Scholar 

  58. Rychlewski, L. and Fischer, D. (2005) LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci, 14, 240–245.

    Article  PubMed  CAS  Google Scholar 

  59. Fischer, D. (2006) Servers for protein structure prediction. Curr Opin Struct Biol, 16, 178–182.

    Article  PubMed  CAS  Google Scholar 

  60. Rost, B., Yachdav, G. and Liu, J. (2004) The PredictProtein server. Nucleic Acids Res, 32, W321–W326.

    Article  PubMed  CAS  Google Scholar 

  61. McGuffin, L.J., Bryson, K. and Jones, D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404–405.

    Article  PubMed  CAS  Google Scholar 

  62. Pieper, U., Eswar, N., Davis, F.P., Braberg, H., Madhusudhan, M.S., Rossi, A., Marti-Renom, M., Karchin, R., Webb, B.M., Eramian, D. et al. (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 34, D291–D295.

    Article  PubMed  CAS  Google Scholar 

  63. Eswar, N., John, B., Mirkovic, N., Fiser, A., Ilyin, V.A., Pieper, U., Stuart, A.C., Marti-Renom, M.A., Madhusudhan, M.S., Yerkovich, B. et al. (2003) Tools for comparative protein structure modeling and analysis. Nucleic Acids Res, 31, 3375–3380.

    Article  PubMed  CAS  Google Scholar 

  64. Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J. (2001) Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins, Suppl 5, 39–46.

    Google Scholar 

  65. Slabinski, L., Jaroszewski, L., Rychlewski, L., Wilson, I.A., Lesley, S.A. and Godzik, A. (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics, 23, 3403–3405.

    Article  PubMed  CAS  Google Scholar 

  66. Fernandez-Fuentes, N., Rai, B.K., Madrid-Aliste, C.J., Fajardo, J.E. and Fiser, A. (2007) Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics, 23, 2558–2565.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Jaroszewski, L. (2009). Protein Structure Prediction Based on Sequence Similarity. In: Astakhov, V. (eds) Biomedical Informatics. Methods in Molecular Biology™, vol 569. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59745-524-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-524-4_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-934115-63-3

  • Online ISBN: 978-1-59745-524-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics