Molecular Biotechnology

, Volume 23, Issue 2, pp 139–166 | Cite as

Bioinformatics methods to predict protein structure and function

A practical approach


Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.

Index Entries

Molecular modeling sequence similarity searches multiple sequence alignment identification and characterization of domains secondary structure prediction solvent accessibility prediction automatic protein fold recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Keller, P. A., Henrick, K., McNeil, P., Moodie, S., and Barton, G. J. (1998) Deposition of macromolecular structures. Acta Crysta. 54, 1105–1108.Google Scholar
  2. 2.
    Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank Nucleic Acids Res. 28, 235–242.PubMedCrossRefGoogle Scholar
  3. 3.
    Bray, J. E., Todd, A. E., Pearl, F. M., Thornton, J. M., and Orengo, C. A. (2000) The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologs. Protein Eng. 13, 153–165.PubMedCrossRefGoogle Scholar
  4. 4.
    Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G., and Chothia C. (2000) SCOP: a structural classification of proteins database. Nucleic Acids Res. 28, 257–259.PubMedCrossRefGoogle Scholar
  5. 5.
    Ison, J. C. (2000) Exploring protein domain structure. Briefings in Bioinformatics 1, 305–312.PubMedCrossRefGoogle Scholar
  6. 6.
    Cottage, A., Clark, M., Hawker, K., et al. (1999) Three receptor genes for plasminogen related growth factors in the genome of the puffer fish Fugu rubripes. FEBS Lett. 443, 370–374.PubMedCrossRefGoogle Scholar
  7. 7.
    Bork, P., Doerks, T., Springer, T. A., and Snel, B. (1999) Domains in plexins: links to integrins and transcription factors. Trends Biochem. Sci. 24, 261–263.PubMedCrossRefGoogle Scholar
  8. 8.
    Corpet, F., Servant, F., Gouzy J., and Kahn, D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269.PubMedCrossRefGoogle Scholar
  9. 9.
    Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Howe, K. L., and Sonnhammer, E. L.. (2000) The Pfam protein families database. Nucleic Acids Res. 28, 263–266.PubMedCrossRefGoogle Scholar
  10. 10.
    Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork, P. (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234.PubMedCrossRefGoogle Scholar
  11. 11.
    Siddiqui, A. S., Dengler, U. and Barton, G. J. (2000) 3Dee: a database of protein structural domains. Bioinformatics. 17, 200–201.CrossRefGoogle Scholar
  12. 12.
    Holm, L. and Sander, C. (1998) Touring protein fold space with Dali/FSSP. Nucleic Acids Res. 26, 316–319.PubMedCrossRefGoogle Scholar
  13. 13.
    Holm, L. and Sander, C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res. 27, 244–247.PubMedCrossRefGoogle Scholar
  14. 14.
    Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999) The PROSITE database, its status in 1999. Nucleic Acids Res. 27, 215–219.PubMedCrossRefGoogle Scholar
  15. 15.
    Attwood, T. K., Croning, M. D. R., Flower, D. R., et al. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res, 28, 225–227.PubMedCrossRefGoogle Scholar
  16. 16.
    Henikoff, J. G. Greene, E. A, Pietrokovski S, Henikoff S. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 28, 228–230.PubMedCrossRefGoogle Scholar
  17. 17.
    Sayle, R. A. and Milner-White, E. J. (1995) Rasmol - Biomolecular Graphics For All. Trends Biochem. Sci. 20, 374–376.PubMedCrossRefGoogle Scholar
  18. 18.
    Bairoch, A. and Apweiler, R. (2000) The SWISSPROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48.PubMedCrossRefGoogle Scholar
  19. 19.
    Bleasby, A. J., Akrigg, D. and Attwood, T. K. (1994) OWL—A non-redundant, composite protein sequence database. Nucleic Acids Res. 22, 3574–3577.PubMedGoogle Scholar
  20. 20.
    Garavelli, J. S., Hou Z., Pattabiraman, N., and Stephens, R. M. (2001) The RESID Database of protein structure modifications and the NRL-3D Sequence-Structure Database. Nucleic Acids Res. 29, 199–201.PubMedCrossRefGoogle Scholar
  21. 21.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  22. 22.
    Cuff, J. A. and Barton, G. J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40, 502–511.PubMedCrossRefGoogle Scholar
  23. 23.
    Rost, B., Schneider, R. and Sander, C. (1997) Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480.PubMedCrossRefGoogle Scholar
  24. 24.
    Jones, D. T. (1999) GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences J. Mol. Biol. 287, 797–815.PubMedCrossRefGoogle Scholar
  25. 25.
    Shi, J. Y., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.PubMedCrossRefGoogle Scholar
  26. 26.
    Kelley, L. A., MacCallum R. M., and Sternberg M. J. E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.PubMedCrossRefGoogle Scholar
  27. 27.
    Lundstrom, J., Rychlewski, L., Bujnicki, J. and Elofsson, A. (2001) Pcons: A neural-network-based consensus predictor that improves fold recognition. Protein Science 10, 2354–2362.PubMedCrossRefGoogle Scholar
  28. 28.
    Bujnicki, J. M., Elofsson, A., Fischer, D. and Rychlewski L (2001) Structure prediction meta server. Bioinformatics 17, 750–751.PubMedCrossRefGoogle Scholar
  29. 29.
    Douguet, D. and Labesse, G. (2001) Easier threading through web-based comparisons and cross-validations. Bioinformatics 17, 752–753.PubMedCrossRefGoogle Scholar
  30. 30.
    Sutcliffe, M. J., Haneef, I., Carney, D., and Blundell, T. L. (1987) Knowledge based modeling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384.PubMedCrossRefGoogle Scholar
  31. 31.
    Sutcliffe, M. J., Hayes, F. R. and Blundell, T. L. (1987) Knowledge based modeling of homologous proteins, Part II: Rules for the conformations of substituted sidechains. Protein Eng. 1, 385–392.PubMedCrossRefGoogle Scholar
  32. 32.
    Sanchez, R. and Sali, A. (2000) Comparative protein structure modeling. Introduction and practical examples with modeller. Methods Mol. Biol. 143, 97–129.PubMedGoogle Scholar
  33. 33.
    Vriend, G. (1990) WhatIf: A molecular modeling and drug design program. J. Mol. Graph 8, 52–56.PubMedCrossRefGoogle Scholar
  34. 34.
    Guex, N., Diemand, A. and Peitsch, M. C. (1999) Protein Modeling for All. Trends Biochem. Sci. 24, 364–367.PubMedCrossRefGoogle Scholar
  35. 35.
    Brocklehurst, S. M. and Perham, R. N. (1993) Prediction of the three-dimensional structures of the biotinylated domain from yeast pyruvate carboxylase and of the lipoylated H-protein from the pea leaf glycine cleavage system: a new automated method for the prediction of protein tertiary structure. Protein Sci. 4, 626–639.Google Scholar
  36. 36.
    Greer, J. (1981). Comparative Model-Building of the Mammalian Serine Proteases. J. Mol. Biol. 153, 1027–1042.PubMedCrossRefGoogle Scholar
  37. 37.
    Laskowski, R. A., Rullmann, J. A., MacArthur, M. W., Kaptein, R., and Thornton, J. M. (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8, 477–486.PubMedCrossRefGoogle Scholar
  38. 38.
    Mizuguchi, K., Deane, C. M., Blundell, T. L., Johnson, M. S. and Overington, J. P. (1998) Joy: protein sequence-structure representation and analysis. Bioinformatics 14, 617–623.PubMedCrossRefGoogle Scholar
  39. 39.
    Bioinformatics: Sequence, Structure and Databanks. A practical approach. (2000) (Higgins, D., and Taylor, W., eds.) IRL, Oxford University Press.Google Scholar
  40. 40.
    Attwood, T. K. and Parry-Smith, D. J. (1999) Introduction to Bioinformatics. Cell and Molecular Biology in Action Series. Published by Addison Wesley Longman, Harlow, Essex, England.Google Scholar
  41. 41.
    Genetics Databases. (1999) (Bishop, M. J., ed.) Academic Press.Google Scholar
  42. 42.
    Branden, C. and Tooze, J. (1998) Introduction to protein structure. The Second Edition. Garland Publishing Inc. New York and London.Google Scholar
  43. 43.
    Protein Structure Prediction—A practical approach. (1996) (Sternberg, M. J. E., ed.) IRL, Oxford University Press.Google Scholar
  44. 44.
    Baker, D. and Sali, A. (2001) Protein structure prediction and structural genomics. Science 294, 93–96.PubMedCrossRefGoogle Scholar
  45. 45.
    Rice, P., Longden, I. and Bleasby, A. (2000) EMBOSS: The European molecular biology open software suite. Trends Genet. 16, 276–277.PubMedCrossRefGoogle Scholar
  46. 46.
    Jones, D. T., Taylor, W. R. and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–89.PubMedCrossRefGoogle Scholar
  47. 47.
    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 12, 2577–2637.CrossRefGoogle Scholar
  48. 48.
    Kraulis, P. J. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst. 24, 946–950.CrossRefGoogle Scholar
  49. 49.
    Edwards, Y. J. K. and Perkins, S. J. (1996) Assessment of protein fold predictions from sequence information—the predicted alpha/beta doubly wound fold of the von Willebrand factor type A domain is similar to its crystal-structure. J. Mol. Biol. 260, 277–285.PubMedCrossRefGoogle Scholar
  50. 50.
    Benner, S. A., Cannarozzi, G., Gerloff, D., Turcotte, D., and Chelvanayagam M. (1997) Bona fide predictions of protein structure using transparent analyses of multiple sequence alignments. Chem. Rev. 97, 2725–2843.PubMedCrossRefGoogle Scholar
  51. 51.
    Siew, N. and Fischer, D. (2001) Convergent evolution of protein structure prediction and computer chess tournaments: CASP, Kasparov, and CAFASP. IBM Systems Journal 40, 410–425.CrossRefGoogle Scholar
  52. 52.
    Bujnicki, J. M., Elofsson, A., Fischer, D. and Rychlewski, L. (2001) LiveBench-1: Continuous benchmarking of protein structure prediction servers. Protein Sci. 10, 352–361.PubMedCrossRefGoogle Scholar
  53. 53.
    Pawlowski, K., Rychlewski, L., Zhang, B. H., and Godzik, A. (2001). Fold predictions for bacterial genomes. J. Struct. Biol. 134, 219–231.PubMedCrossRefGoogle Scholar
  54. 54.
    Cottage, A., Edwards, Y. J. K., and Elgar, G. (2001) SAND, a new protein family: from nucleic acid to protein structure and function prediction. Compar. Funct. Genom. 2, 226–235.CrossRefGoogle Scholar
  55. 55.
    Edwards, Y. J. K. and Perkins, S. J. (1995) The protein fold of the von-willebrand-factor type-a domain is predicted to be similar to the open twisted beta-sheet flanked by alpha-helices found in human Ras-P21. FEBS Lett 358, 283–286.PubMedCrossRefGoogle Scholar
  56. 56.
    Devos, D. and Valencia, A. (2000) Practical limits of function prediction. Proteins 41, 98–107.PubMedCrossRefGoogle Scholar
  57. 57.
    Sander, C. and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68.PubMedCrossRefGoogle Scholar
  58. 58.
    Doolittle, R. F. (1981) Similar amino-acidsequences: Chance or common ancestry. Science 214, 149–159.PubMedCrossRefGoogle Scholar
  59. 59.
    Park, J., Karplus, K., Barrett, C., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologs as pairwise methods. J. Mol. Biol. 284, 1201–1210.PubMedCrossRefGoogle Scholar
  60. 60.
    Edwards, Y. J. K. and Cottage, A. (2001) Prediction of protein structure and function by using Bioinformatics. Methods Mol. Biol. 175, 341–375.PubMedGoogle Scholar

Copyright information

© Humana Press Inc 2003

Authors and Affiliations

  1. 1.Research DivisionUK Human Genome Mapping Project Resource Center, Wellcome Trust Genome Campus, HinxtonCambridgeEngland, UK

Personalised recommendations