Prediction of Protein Functions

  • Roy D. SleatorEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 815)


The recent explosion in the number and diversity of novel proteins identified by the large-scale “omics” technologies poses new and important questions to the blossoming field of systems biology – What are all these proteins, how did they come about, and most importantly, what do they do?

From a comparatively small number of protein structural domains a staggering array of structural variants has evolved, which has in turn facilitated an expanse of functional derivatives. This review considers the primary mechanisms that have contributed to the vastness of our existing, and expanding, protein repertoires, while also outlining the protocols available for elucidating their true biological function. The various function prediction programs available, both sequence and structure based, are discussed and their associated strengths and weaknesses outlined.

Key words

Protein function Homology-based transfer Ontologies Sequence and structure motifs Evolution Protein domains Gene duplication Divergence Combination Circular permutation 


  1. 1.
    Chothia, C., and Gough, J. (2009) Genomic and structural aspects of protein evolution, Biochem J 419, 15–28.PubMedCrossRefGoogle Scholar
  2. 2.
    Sleator, R. D. (2010) An overview of the processes shaping protein evolution, Science Progress 93, 1–6.PubMedCrossRefGoogle Scholar
  3. 3.
    Sleator, R. D., and Walsh, P. (2010) An overview of in silico protein function prediction, Arch Microbiol 192, 151–155.PubMedCrossRefGoogle Scholar
  4. 4.
    Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., Chothia, C., and Gough, J. (2009) SUPERFAMILY – sophisticated comparative genomics, data mining, visualization and phylogeny, Nucleic Acids Res 37, D380–386.PubMedCrossRefGoogle Scholar
  5. 5.
    Blanchetot, A., Wilson, V., Wood, D., and Jeffreys, A. J. (1983) The seal myoglobin gene: an unusually long globin gene, Nature 301, 732–734.PubMedCrossRefGoogle Scholar
  6. 6.
    Moore, A. D., Bjorklund, A. K., Ekman, D., Bornberg-Bauer, E., and Elofsson, A. (2008) Arrangements in the modular evolution of proteins, Trends Biochem Sci 33, 444–451.PubMedCrossRefGoogle Scholar
  7. 7.
    Todd, A. E., Orengo, C. A., and Thornton, J. M. (2001) Evolution of function in protein superfamilies, from a structural perspective, J Mol Biol 307, 1113–1143.PubMedCrossRefGoogle Scholar
  8. 8.
    Longhi, S., Czjzek, M., Lamzin, V., Nicolas, A., and Cambillau, C. (1997) Atomic resolution (1.0 A) crystal structure of Fusarium solani cutinase: stereochemical analysis, J Mol Biol 268, 779–799.PubMedCrossRefGoogle Scholar
  9. 9.
    Chen, J. C., Miercke, L. J., Krucinski, J., Starr, J. R., Saenz, G., Wang, X., Spilburg, C. A., Lange, L. G., Ellsworth, J. L., and Stroud, R. M. (1998) Structure of bovine pancreatic cholesterol esterase at 1.6 A: novel structural features involved in lipase activation, Biochemistry 37, 5107–5117.PubMedCrossRefGoogle Scholar
  10. 10.
    Gerstein, M., Sonnhammer, E. L., and Chothia, C. (1994) Volume changes in protein evolution, J Mol Biol 236, 1067–1078.PubMedCrossRefGoogle Scholar
  11. 11.
    Chothia, C., and Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins, EMBO J 5, 823–826.PubMedGoogle Scholar
  12. 12.
    Kawashima, T., Kawashima, S., Tanaka, C., Murai, M., Yoneda, M., Putnam, N. H., Rokhsar, D. S., Kanehisa, M., Satoh, N., and Wada, H. (2009) Domain shuffling and the evolution of vertebrates, Genome Res 19, 1393–1403.PubMedCrossRefGoogle Scholar
  13. 13.
    Vogel, C., and Morea, V. (2006) Duplication, divergence and formation of novel protein topologies, Bioessays 28, 973–978.PubMedCrossRefGoogle Scholar
  14. 14.
    Lindqvist, Y., and Schneider, G. (1997) Circular permutations of natural protein sequences: structural evidence, Curr Opin Struct Biol 7, 422–427.PubMedCrossRefGoogle Scholar
  15. 15.
    Lo, W. C., Lee, C. C., Lee, C. Y., and Lyu, P. C. (2009) CPDB: a database of circular permutation in proteins, Nucleic Acids Res 37, D328–332.PubMedCrossRefGoogle Scholar
  16. 16.
    Heinemann, U., Ay, J., Gaiser, O., Muller, J. J., and Ponnuswamy, M. N. (1996) Enzymology and folding of natural and engineered bacterial beta-glucanases studied by X-ray crystallography, Biol Chem 377, 447–454.PubMedGoogle Scholar
  17. 17.
    Wyckoff, T. J., and Raetz, C. R. (1999) The active site of Escherichia coli UDP-N-acetylglucosamine acyltransferase. Chemical modification and site-directed mutagenesis, J Biol Chem 274, 27047–27055.Google Scholar
  18. 18.
    Yoon, S. I., Jones, B. C., Logsdon, N. J., and Walter, M. R. (2005) Same structure, different function crystal structure of the Epstein-Barr virus IL-10 bound to the soluble IL-10R1 chain, Structure 13, 551–564.PubMedCrossRefGoogle Scholar
  19. 19.
    Apic, G., Gough, J., and Teichmann, S. A. (2001) Domain combinations in archaeal, eubacterial and eukaryotic proteomes, J Mol Biol 310, 311–325.PubMedCrossRefGoogle Scholar
  20. 20.
    Hoeffken, H. W., Knof, S. H., Bartlett, P. A., Huber, R., Moellering, H., and Schumacher, G. (1988) Crystal structure determination, refinement and molecular model of creatine amidinohydrolase from Pseudomonas putida, J Mol Biol 204, 417–433.PubMedCrossRefGoogle Scholar
  21. 21.
    Godzik, A., Jambon, M., and Friedberg, I. (2007) Computational protein function prediction: are we making progress? Cell Mol Life Sci 64, 2505–2511.PubMedCrossRefGoogle Scholar
  22. 22.
    Losko, S., and Heumann, K. (2009) Semantic data integration and knowledge management to represent biological network associations, Methods Mol Biol 563, 241–258.PubMedCrossRefGoogle Scholar
  23. 23.
    Ashburner, M., and Lewis, S. (2002) On ontologies for biologists: the Gene Ontology – untangling the web, Novartis Found Symp 247, 66–80; discussion 80–63, 84–90, 244–252.Google Scholar
  24. 24.
    Smith, C. L., Goldsmith, C. A., and Eppig, J. T. (2005) The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information, Genome Biol 6, R7.PubMedCrossRefGoogle Scholar
  25. 25.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res 25, 3389–3402.PubMedCrossRefGoogle Scholar
  26. 26.
    Rost, B. (2002) Enzyme function less conserved than anticipated, J Mol Biol 318, 595–608.PubMedCrossRefGoogle Scholar
  27. 27.
    Galperin, M. Y., Walker, D. R., and Koonin, E. V. (1998) Analogous enzymes: independent inventions in enzyme evolution, Genome Res 8, 779–790.PubMedGoogle Scholar
  28. 28.
    Bork, P. (2000) Powers and pitfalls in sequence analysis: the 70% hurdle, Genome Res 10, 398–400.PubMedCrossRefGoogle Scholar
  29. 29.
    Gilks, W. R., Audit, B., de Angelis, D., Tsoka, S., and Ouzounis, C. A. (2005) Percolation of annotation errors through hierarchically structured protein sequence databases, Math Biosci 193, 223–234.PubMedCrossRefGoogle Scholar
  30. 30.
    Sleator, R. D., Shortall, C., and Hill, C. (2008) Metagenomics, Lett Appl Microbiol 47, 361–366.PubMedCrossRefGoogle Scholar
  31. 31.
    Rost, B., Liu, J., Nair, R., Wrzeszczynski, K. O., and Ofran, Y. (2003) Automatic prediction of protein function, Cell Mol Life Sci 60, 2637–2650.PubMedCrossRefGoogle Scholar
  32. 32.
    Friedberg, I. (2006) Automated protein function prediction – the genomic challenge, Brief Bioinform 7, 225–242.PubMedCrossRefGoogle Scholar
  33. 33.
    Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B. A., de Castro, E., Lachaize, C., Langendijk-Genevaux, P. S., and Sigrist, C. J. (2008) The 20 years of PROSITE, Nucleic Acids Res 36, D245–249.PubMedCrossRefGoogle Scholar
  34. 34.
    Henikoff, J. G., Greene, E. A., Pietrokovski, S., and Henikoff, S. (2000) Increased coverage of protein families with the blocks database servers, Nucleic Acids Res 28, 228–230.PubMedCrossRefGoogle Scholar
  35. 35.
    Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., and Zygouri, C. (2003) PRINTS and its automatic supplement, prePRINTS, Nucleic Acids Res 31, 400–402.PubMedCrossRefGoogle Scholar
  36. 36.
    Eisenberg, D., Marcotte, E. M., Xenarios, I., and Yeates, T. O. (2000) Protein function in the post-genomic era, Nature 405, 823–826.PubMedCrossRefGoogle Scholar
  37. 37.
    Enault, F., Suhre, K., and Claverie, J. M. (2005) Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis, BMC Bioinformatics 6, 247.PubMedCrossRefGoogle Scholar
  38. 38.
    Walker, M. G., Volkmuth, W., Sprinzak, E., Hodgson, D., and Klingler, T. (1999) Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes, Genome Res 9, 1198–1203.PubMedCrossRefGoogle Scholar
  39. 39.
    Zhao, X. M., Chen, L., and Aihara, K. (2008) Protein function prediction with high-throughput data, Amino Acids 35, 517–530.PubMedCrossRefGoogle Scholar
  40. 40.
    Watson, J. D., Laskowski, R. A., and Thornton, J. M. (2005) Predicting protein function from sequence and structural data, Curr Opin Struct Biol 15, 275–284.PubMedCrossRefGoogle Scholar
  41. 41.
    Ye, Y., and Godzik, A. (2004) FATCAT: a web server for flexible structure comparison and structure similarity searching, Nucleic Acids Res 32, W582–585.PubMedCrossRefGoogle Scholar
  42. 42.
    Taubig, H., Buchner, A., and Griebsch, J. (2006) PAST: fast structure-based searching in the PDB, Nucleic Acids Res 34, W20–23.PubMedCrossRefGoogle Scholar
  43. 43.
    Gibrat, J. F., Madej, T., and Bryant, S. H. (1996) Surprising similarities in structure comparison, Curr Opin Struct Biol 6, 377–385.PubMedCrossRefGoogle Scholar
  44. 44.
    Laskowski, R. A., Watson, J. D., and Thornton, J. M. (2003) From protein structure to biochemical function? J Struct Funct Genomics 4, 167–177.PubMedCrossRefGoogle Scholar
  45. 45.
    Goldsmith-Fischman, S., and Honig, B. (2003) Structural genomics: computational methods for structure analysis, Protein Sci 12, 1813–1821.PubMedCrossRefGoogle Scholar
  46. 46.
    Jones, S., and Thornton, J. M. (2004) Searching for functional sites in protein structures, Curr Opin Chem Biol 8, 3–7.PubMedCrossRefGoogle Scholar
  47. 47.
    Wallace, A. C., Laskowski, R. A., and Thornton, J. M. (1996) Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases, Protein Sci 5, 1001–1013.PubMedCrossRefGoogle Scholar
  48. 48.
    Di Gennaro, J. A., Siew, N., Hoffman, B. T., Zhang, L., Skolnick, J., Neilson, L. I., and Fetrow, J. S. (2001) Enhanced functional annotation of protein sequences via the use of structural descriptors, J Struct Biol 134, 232–245.PubMedCrossRefGoogle Scholar
  49. 49.
    Chothia, C., Gough, J., Vogel, C., and Teichmann, S. A. (2003) Evolution of the protein repertoire, Science 300, 1701–1703.PubMedCrossRefGoogle Scholar
  50. 50.
    Jeffery, C. J. (2003) Moonlighting proteins: old proteins learning new tricks, Trends Genet 19, 415–417.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Biological SciencesCork Institute of TechnologyBishopstownIreland

Personalised recommendations