Protein Structure Prediction as a Systems Problem

  • Dong Xu
  • Ying Xu
Part of the Biological and Medical Physics, Biomedical Engineering book series (BIOMEDICAL)


Protein structure prediction is a complex problem, which requires various types of techniques to solve different aspects of the problem. To address this complexity, a large number of computational methods have been developed, as discussed in the previous chapters of this book. When the protein structure prediction problem was first formulated, it was envisioned that one could computationally predict any protein structure through solving a minimization problem of a unified physical energy function since protein folding is governed by laws of physics. Such an approach is referred to as “ab initio” protein structure prediction (Tanaka and Scheraga, 1977; States et al., 1980; Chapter 13 of this book). While appealing and theoretically achievable, such ab initio methods only have a limited practical value for the purpose of structure prediction as of now due to the inadequacy of current physical models and the significant gap between the required computing power and what is currently available, among other issues.


Protein Data Bank Structure Prediction System Problem Protein Structure Prediction Query Protein 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.CrossRefGoogle Scholar
  2. Attwood, T.K., Flower, D.R., Lewis, A.P., Mabey, J.E., Morgan, S.R., Scordis, P., Selley, J., and Wright, W. 1999. PRINTS prepares for the new millennium. Nucleic Acids Res. 27:220–225.CrossRefGoogle Scholar
  3. Bairoch, A., and Apweiler, R. 1999. The SwissProt protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27:49–54.CrossRefGoogle Scholar
  4. Baker, S.H., Lorbach, S.C., Rodriguez-Buey, M., et al. 1999. The correlation of the gene csoS2 of the carboxysome operon with two polypeptides of the carboxysome in Thiobacillus neapolitanus. Arch. Microbiol 172:233–239.CrossRefGoogle Scholar
  5. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The Protein Data Bank: A computer based archival file for macromolecular structures. J. Mol. Biol. 112:535–542.CrossRefGoogle Scholar
  6. Blake, J.D., and Cohen, F.E. 2001. Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307:721–735.CrossRefGoogle Scholar
  7. Bowie, J.U., Luthy, R., Eisenberg. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253:164–170.CrossRefADSGoogle Scholar
  8. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., and Karplus, M. 1983. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4:187–217.CrossRefGoogle Scholar
  9. Browne, W.J., North, A.C.T., Phillips, D.C., Brew, K., Vanaman, T.C., and Hill, R.C. 1969. A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J. Mol. Biol. 42:65.CrossRefGoogle Scholar
  10. Cannon, G.C., Bradburne, C.E., Aldrich, H.C., Baker, S.H., Heinhorst, S., and Shively, J.M. 2001. Micocompartments in prokaryotes: Carboxysomes and related polyhedra. Appl. Environ. Microbiol. 67:5351–5361.CrossRefGoogle Scholar
  11. Case, D.A., Cheatham, T.E., 3rd, Darden, T., Gohlke, H., Luo, R., Merz, K.M., Jr., Onufriev, A., Simmerling, C., Wang, B., and Woods, R.J. 2005. The Amber biomolecular simulation programs. J. Comput. Chem. 26:1668–1688.CrossRefGoogle Scholar
  12. Cherkasov, A., Ho Sui, S.J., Brunham, R.C., and Jones, S.J. 2004. Structural characterization of genomes by large scale sequence-structure threading: Application of reliability analysis in structural genomics. BMC Bioinformatics 5:101.CrossRefGoogle Scholar
  13. Cherkasov, A., and Jones, S.J. 2004. Structural characterization of genomes by large scale sequence-structure threading. BMC Bioinformatics 5:37.CrossRefGoogle Scholar
  14. Chien, C., Bartel, P., Sternglanz, R., and Fields, S. 1991. The two-hybrid system: A method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. USA 88:9578–9582.CrossRefADSGoogle Scholar
  15. Chivian, D., Kim, D.E., Malmstrom, L., Bradley, P., Robertson, T., Murphy, P., Strauss, C.E., Bonneau, R., Rohl, C.A., and Baker, D. 2003. Automated prediction of CASP-5 structures using the Roberta server. Proteins 53(Suppl. 6):524–533.CrossRefGoogle Scholar
  16. Corpet, F., Servant, F., Gouzy, J., and Kahn, D. 2000. ProDom and ProDom-CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28:267–269.CrossRefGoogle Scholar
  17. Deshpande, N., Addess, K.J., Bluhm, W.E., Merino-Ott, J.C., Townsend-Merino, W., Zhang, Q., Knezevich, C., Xie, L., Chen, L., Feng, Z., Green, R.K., Flippen-Anderson, J.L., Westbrook, J., Berman, H.M., and Bourne, P.E. 2005. The RCSB Protein Data Bank: A redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 33(Database issue):D233–D237.CrossRefGoogle Scholar
  18. Edwards, Y.J., and Cottage, A. 2003. Bioinformatics methods to predict protein structure and function. A practical approach. Mol. Biotechnol. 23:139–166.CrossRefGoogle Scholar
  19. Fetrow, J.S., Giammona, A., Kolinski, A., and Skolnick, J. 2002. The protein folding problem: A biophysical enigma. Curr. Pharm. Biotechnol. 3:329–347.CrossRefGoogle Scholar
  20. Fischer, D. 2000. Hybrid fold recognition: Combining sequence derived properties with evolutionary information. Pac. Symp. Biocomput. Hawaii, pp. 119–130, World Scientific.Google Scholar
  21. Fischer, D. 2003. 3D-SHOTGUN: A novel, cooperative, fold-recognition metapredictor. Proteins 51:434–441.CrossRefGoogle Scholar
  22. Fischer, D., and Eisenberg, D. 1997. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc. Natl. Acad. Sci. USA 94:11929–11934.CrossRefADSGoogle Scholar
  23. Fischer, D., Rychlewski, L., Dunbrack, R.L., Jr., Ortiz, A.R., and Elofsson, A. 2003. CAFASP3: the third critical assessment of fully automated structure prediction methods. Proteins 53(Suppl. 6):503–516.CrossRefGoogle Scholar
  24. Forgy, C.F. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artif. Intell. 19:17–37.CrossRefGoogle Scholar
  25. Friedberg, D., Jager, K.M., Kessel, M., Silman, N.J., and Bergman, B. 1993. Rubisco but not Rubisco activase is clustered in the carboxysomes of the cyanobacterium Synechococcus sp. PCC 7942: Mud-induced carboxysomeless mutants. Mol. Microbiol. 9:1193–1201.CrossRefGoogle Scholar
  26. Friedman-Hill, E. 2003. Jess in Action: Java Rule-Based Systems. Greenwich, CT, Manning Publications.Google Scholar
  27. Giarratano, J.C., and Riley, G.D. 2004. Expert Systems: Principles and Programming, 4th edition. Boston, Course Technology.Google Scholar
  28. Ginalski, K., Elofsson, A., Fischer, D., and Rychlewski, L. 2003. 3D-Jury: A simple approach to improve protein structure predictions. Bioinformatics 19:1015–1018.CrossRefGoogle Scholar
  29. Ginalski, K., Grishin, N.V., Godzik, A., and Rychlewski, L. 2005. Practical lessons from protein structure prediction. Nucleic Acids Res. 33:1874–1891.CrossRefGoogle Scholar
  30. Ginalski, K., and Rychlewski, L. 2003. Protein structure prediction of CASP5 comparative modeling and fold recognition targets using consensus alignment approach and 3D assessment. Proteins 53(Suppl. 6):410–417.CrossRefGoogle Scholar
  31. Godzik, A. 2003. Fold recognition methods. Methods Biochem. Anal. 44:525–546.Google Scholar
  32. Greer, J. 1980. Model for haptoglobin heavy chain based upon structural homology. Proc. Natl. Acad. Sci. USA 77:3393–3397.CrossRefADSGoogle Scholar
  33. Guo, J.T., Ellrott, K., Chung, W.J., Xu, D., Passovets, S., and Xu, Y. 2004. PROSPECT-PSPP: An automatic computational pipeline for protein structure prediction. Nucleic Acids Res. 32(Web Server issue):W522–W525.CrossRefGoogle Scholar
  34. Henikoff, J.G., Henikoff, S., and Pietrokovski, S. 1999. New features of the blocks database servers. Nucleic Acids Res 27:226–228.CrossRefGoogle Scholar
  35. Hirokawa, T., Boon-Cheing, S., and Mitaku, S. 1998. Classification and secondary structure prediction system for membrane proteins. Bioinformatics 14:378–379.CrossRefGoogle Scholar
  36. Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A. 1999. The PROSITE database, its status in 1999. Nucleic Acids Res. 27:215–219.CrossRefGoogle Scholar
  37. Jansen, J.M., and Martin, E.J. 2004. Target-biased scoring approaches and expert systems in structure-based virtual screening. Curr. Opin. Chem. Biol. 8:359–364.CrossRefGoogle Scholar
  38. Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195–202.CrossRefGoogle Scholar
  39. Jones, D.T. 2001. Protein structure prediction in genomics. Brief Bioinform.2:111–125.CrossRefGoogle Scholar
  40. Jones, D.T., Taylor, W.R., and Thornton, J.M. 1992. A new approach to protein fold recognition. Nature 358:86–89.CrossRefADSGoogle Scholar
  41. Juan, D., Grana, O., Pazos, F., Fariselli, P., Casadio, R., and Valencia, A. 2003. A neural network approach to evaluate fold recognition results. Proteins 50:600–608.CrossRefGoogle Scholar
  42. Kim, D., Xu, D., Guo, J.T., Ellrott, K., and Xu, Y. 2003. PROSPECT II: Protein structure prediction program for genome-scale applications. Protein Eng. 16:641–650.CrossRefGoogle Scholar
  43. Kitson, D.H., Badretdinov, A., Zhu, Z.Y., Velikanov, M., Edwards, D.J., Olszewski, K., Szalma, S., and Yan, L. 2002. Functional annotation of proteomic sequences based on consensus of sequence and structural analysis. Brief Bioinform. 3:32–44.CrossRefGoogle Scholar
  44. Koh, I.Y., Eyrich, V.A., Marti-Renom, M.A., Przybylski, D., Madhusudhan, M.S., Eswar, N., Grana, O., Pazos, F., Valencia, A., Sali, A., and Rost, B. 2003. EVA: Evaluation of protein structure prediction servers. Nucleic Acids Res. 31:3311–3315.CrossRefGoogle Scholar
  45. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567–580.CrossRefGoogle Scholar
  46. Kurowski, M.A., and Bujnicki, J.M. 2003. GeneSilico protein structure prediction meta-server. Nucleic Acids Res. 31:3305–3307.CrossRefGoogle Scholar
  47. Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. 1993. PROCHECK: A program to check the stereochemical quality of protein structures. J. Crystallogr. 26:283–291.CrossRefGoogle Scholar
  48. Leahy, D.J., Erickson, H.P., Aukhil, I., et al. 1994. Crystallization of a fragment of human fibronectin: Introduction of methionine by site-directed mutagenesis to allow phasing via selenomethionine. Proteins 19:48–54.CrossRefGoogle Scholar
  49. Leplae, R., and Hubbard, T. J. 2002. MaxBench: Evaluation of sequence and structure comparison methods. Bioinformatics 18:494–495.CrossRefGoogle Scholar
  50. Lundstrom, J., Rychlewski, L., Bujnicki, J., and Elofsson, A. 2001. Peons: A neuralnetwork-based consensus predictor that improves fold recognition. Protein Sci. 10:2354–2362.CrossRefGoogle Scholar
  51. Lupas, A., van Dyke, M., and Stock, J. 1991. Predicting coiled coils from protein sequences. Science 252:1162–1164.CrossRefADSGoogle Scholar
  52. Lytle, B.L., Peterson, F.C., and Volkman, B.F. in press. Solution structure of a human C2H2-type zinc finger protein.Google Scholar
  53. Metaxiotis, K.S., and Samouilidis, J.E. 2000. Expert systems in medicine: Academic exercise or practical tool? J. Med. Eng. Technol. 24:68–72.CrossRefGoogle Scholar
  54. Moult, J. 2005. A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15:285–289.CrossRefGoogle Scholar
  55. Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536–540.Google Scholar
  56. Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10:1–6.CrossRefGoogle Scholar
  57. Partensky, F., Hess, W.R., and Vaulot, D. 1999. Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol. Mol. Biol. Rev. 63:106–127.Google Scholar
  58. Prakash, B., Praefcke, G.J., Renault, L., Wittinghofer, A., and Herrmann, C. 2000. Structure of human guanylate-binding protein 1 representing a unique class of GTP-binding proteins. Nature 403:567–571.CrossRefADSGoogle Scholar
  59. Qian, J., Luscombe, N.M., and Gerstein, M. 2001. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J. Mol. Biol. 313:673–681.CrossRefGoogle Scholar
  60. Radivojac, P., Obradovic, Z., Smith, D.K., Zhu, G., Vucetic, S., Brown, C.J., Lawson, J.D., and Dunker, A.K. 2004. Protein flexibility and intrinsic disorder. Protein Sci. 13:71–80.CrossRefGoogle Scholar
  61. Russell, R.B., Saqi, M.A., Sayle, R.A., Bates, P.A., and Sternberg, M.J. 1997. Recognition of analogous and homologous protein folds: Analysis of sequence and structure conservation. J. Mol. Biol. 269:423–439.CrossRefGoogle Scholar
  62. Rychlewski, L., Fischer, D., and Elofsson, A. 2003. LiveBench-6: Large-scale automated evaluation of protein structure prediction servers. Proteins 53(Suppl. 6):542–547.CrossRefGoogle Scholar
  63. Sali, A., and Blundell, T.L. 1993. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234:779–815.CrossRefGoogle Scholar
  64. Sanchez, R., and Sali, A. 1998. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc. Natl. Acad. Sci. USA 95:13597–13602.CrossRefADSGoogle Scholar
  65. Shah, M., Passovets, S., Kim, D., Ellrott, K., Wang, L., Vokler, I., LoCascio, P., Xu, D., and Xu, Y. 2003. A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics 19:1985–1996.CrossRefGoogle Scholar
  66. Simons, K.T., Kooperberg, C., Huang, E., and Baker, D. 1997. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268:209–225.CrossRefGoogle Scholar
  67. Sommer, I., Zien, A., von Ohsen, N., Zimmer, R., and Lengauer, T. 2002. Confidence measures for protein fold recognition. Bioinformatics 18:802–812.CrossRefGoogle Scholar
  68. States, D.J., Dobson, C.M., Karplus, M., and Creighton, T.E. 1980. A conformational isomer of bovine pancreatic trypsin inhibitor protein produced by refolding. Nature 286:630–632.CrossRefADSGoogle Scholar
  69. Steward, A., Adhya, S., and Clarke, J. 2002. Sequence conservation in Ig-like domains: The role of highly conserved proline residues in the fibronectin type III superfamily. J. Mol. Biol. 318:935–940.CrossRefGoogle Scholar
  70. Tanaka, S., and Scheraga, H.A. 1977. Model of protein folding: Incorporation of a one-dimensional short-range (Ising) model into a three-dimensional model. Proc. Natl. Acad. Sci USA 74:1320–1323.CrossRefADSGoogle Scholar
  71. Tramontano, A., and Morea, V. 2003. Assessment of homology-based predictions in CASP5. Proteins 53(Suppl. 6):352–368.CrossRefGoogle Scholar
  72. Venclovas, C., Zemla, A., Fidelis, K., and Moult, J. 2003. Assessment of progress over the CASP experiments. Proteins 53(Suppl. 6):585–595.CrossRefGoogle Scholar
  73. Vinogradova, M.V., Stone, D.B., Malanina, G.G., Karatzaferi, C., Cooke, R., Mendelson, R.A., and Fletterick, R.J. 2005. Ca(2+)-regulated structural changes in troponin. Proc. Natl. Acad. Sci. USA 102:5038–5043.CrossRefADSGoogle Scholar
  74. Vriend, G. 1990. WHAT IF: A molecular modelling and drug design program. J. Mol. Graph. 8:52–56.CrossRefGoogle Scholar
  75. Wallace, A.C., Laskowski, R.A., and Thornton, J.M. 1996. Derivation of 3D coordinate templates for searching structural databases: Application to the Ser-His-Asp catalytic triads of the serine proteinases and lipases. Protein Sci. 5:1001–1013.CrossRefGoogle Scholar
  76. Wallner, B., Fang, H., and Elofsson, A. 2003. Automatic consensus-based fold recognition using Peons, ProQ, and Pmodeller. Proteins 53(Suppl. 6):534–541.CrossRefGoogle Scholar
  77. Whisstock, J.C., and Lesk, A.M. 2003. Prediction of protein function from protein sequence and structure. Q. Rev. Biophys. 36:307–340.CrossRefGoogle Scholar
  78. Wolfson, H.J., Shatsky, M., Schneidman-Duhovny, D., Dror, O., Shulman-Peleg, A., Ma, B., and Nussinov, R. 2005. From structure to function: Methods and applications. Curr. Protein Pept. Sci. 6:171–183.CrossRefGoogle Scholar
  79. Xu, D., Baburaj, K., Peterson, C.B., and Xu, Y. 2001a. Model for the threedimensional structure of vitronectin: Predictions for the multi-domain protein from threading and docking. Proteins 44:312–320.CrossRefGoogle Scholar
  80. Xu, D., Crawford, O.H., LoCascio, P.F., and Xu, Y. 2001b. Application of PROSPECT in CASP4: Characterizing protein structures with new folds. Proteins Struct. Fund Genet. (CASP4 Special Issue) 46:140–148.CrossRefGoogle Scholar
  81. Xu, D., Kim, D., Dam, P., Shah, M., Uberbacher, E.C., and Xu, Y. 2003. Characterization of protein structure and function at genome scale using a computational prediction pipeline. In Genetic Engineering, Principles and Methods, edited by J. K. Setlow. New York, Kluwer Academic/Plenum Publishers, pp. 269–293.Google Scholar
  82. Xu, Y., and Xu, D. 2000. Protein threading using PROSPECT: Design and evaluation. Proteins 40:343–354.CrossRefGoogle Scholar
  83. Xu, Y., Xu, D., Crawford, O.H., Larimer, E.F., Uberbacher, E., Unseren, M.A., and Zhang, G. 1999. Protein threading by PROSPECT: A prediction experiment in CASP3. Protein Eng. 12:899–907.CrossRefGoogle Scholar
  84. Zhang, B., Rychlewski, L., Pawlowski, K., Fetrow, J.S., Skolnick, J., and Godzik, A. 1999. From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions. Protein Sci. 8:1104–1115.CrossRefGoogle Scholar
  85. Zhang, W., and Chait, B.T. 2000. ProFound: An expert system for protein identification using mass spectrometric peptide mapping information. Anal. Chem. 72:2482–2489.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Dong Xu
    • 1
  • Ying Xu
    • 2
  1. 1.Computer Science DepartmentUniversity of Missouri-ColumbiaColumbia
  2. 2.Institute of Bioinformatics and Department of Biochemistry and Molecular BiologyUniversity of GeorgiaAthens

Personalised recommendations