Protein Structure Comparison and Classification

  • Orhan Çamoğlu
  • Ambuj K. Singh


The success of genome projects has generated an enormous amount of sequence data. In order to realize the full value of the data, we need to understand its functional role and its evolutionary origin. Sequence comparison methods are incredibly valuable for this task. However, for sequences falling in the twilight zone (usually between 20 and 35% sequence similarity), we need to resort to structural alignment and comparison for a meaningful analysis. Such a structural approach can be used for classification of proteins, isolation of structural motifs, and discovery of drug targets.


Protein Data Bank Index Structure Structure Alignment Query Protein Initial Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alexandrov, N., and D. Fischer. 1996. Analysis of topological and nontopological structural similarities in the PDB: New examples from old structures. Proteins 25:354–365.CrossRefGoogle Scholar
  2. Altschul, S. F., and E. V. Koonin. 1998. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem Sci. 23:444–447.CrossRefGoogle Scholar
  3. Arun, K., T. Huang, and S. Blostein. 1987. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9:698–700.CrossRefGoogle Scholar
  4. Aung, Z., and K.-L. Tan. 2004. Rapid 3d protein structure database searching using information retrieval techniques. Bioinformatics 20:1045–1052.CrossRefGoogle Scholar
  5. Aung, Z., K.-L. Tan, and W. Fu. 2003. An efficient index-based protein structure database searching method. In DASFAA.Google Scholar
  6. Beckmann, N., H.-P. Kriegel, R. Schneider, and B. Seeger. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pp. 322–331, Atlantic City, NJ.Google Scholar
  7. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235–242.CrossRefGoogle Scholar
  8. Binkowski, T. A., B. DasGupta, and J. Liang. 2004. Order independent structural alignment of circularly permuted proteins. In IEEE EMBS, July.Google Scholar
  9. Bradley, P., P. S. Kim, and B. Berger. 2002. TRILOGY: Discovery of sequence–structure patterns across diverse proteins. Proc. Natl. Acad. Sci. USA 99:8500–8503.CrossRefADSGoogle Scholar
  10. Brown, N., C. Orengo, and W. Taylor. 1996. A protein structure comparison methodology. Comput. Chem. 20:359–380.CrossRefGoogle Scholar
  11. Camoglu, O., T. Can, A. K. Singh, and Y.-F. Wang. 2005. Decision tree based information integration for automated protein classification. J. Bioinform. Comput. Biol. 3(3):717–742.CrossRefGoogle Scholar
  12. Camoglu, O., T. Kahveci, and A. K. Singh. 2004. Index-based similarity search for protein structure databases. J. Bioinform. Comput. Biol. 2:99–126.CrossRefGoogle Scholar
  13. Camoglu, O., T. Kahveci, and A. K. Singh. 2003. Towards index-based similarity search for protein structure databases. In CSB, pp. 148–158.Google Scholar
  14. Can, T., O. Camoglu, A. K. Singh, and Y.-F. Wang. 2004. Automated protein classification using consensus decision. In CSB, pp. 224–235.Google Scholar
  15. Chen, S.-C., and I. Bahar. 2004. Mining frequent patterns in protein structures: A study of protease families. Bioinformatics 20:77–85.MATHCrossRefGoogle Scholar
  16. Chew, L., D. Huttenlocher, K. Kedem, and J. Kleinberg. 1999. Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6:313–325.CrossRefGoogle Scholar
  17. Chothia, C., J. Gough, C. Vogel, and S. A. Teichmann. 2003. Evolution of the protein repertoire. Science 300:1701–1703. URL cgi/content/abstract/300/5626/1701.CrossRefADSGoogle Scholar
  18. Dror, O., H. Benyamini, R. Nussinov, and H. Wolfson. 2003. MASS: Multiple structural alignment by secondary structures. Bioinformatics 19:i95–i104.CrossRefGoogle Scholar
  19. Duda, R. O., P. E. Hart, and D. G. Stork. 2001. Pattern Classification, 2nd edition. New York, Wiley–Interscience.MATHGoogle Scholar
  20. Eddy, S. R. 1998. Profile hidden Markov models. Bioinformatics 14:755–763.CrossRefGoogle Scholar
  21. Eidhammer, I., and I. Jonassen. 2001. Protein structure comparison and structure patterns—An algorithmic approach. ISMB tutorial.Google Scholar
  22. Eidhammer, I., I. Jonassen, and W. Taylor. 2000. Structure comparison and structure patterns. J. Comput Biol. 7:685–716.CrossRefGoogle Scholar
  23. Fischer, D. 2003. 3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor. Proteins Struct. Funct. Genet. 51:434–441.CrossRefGoogle Scholar
  24. Garey, M., and D. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco, Freeman.MATHGoogle Scholar
  25. Gerstein, M. 1997. A structural census of genomes: Comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Mol. Biol. 274:562– 576.CrossRefGoogle Scholar
  26. Gerstein, M., and M. Levitt. 1996. Using iterative dynamic programming to obtain pairwise and multiple alignments of protein structures. In ISMB, pp. 59–66. PMID: 8877505.Google Scholar
  27. Getz, G., M. Vendruscolo, D. Sachs, and E. Domany. 2002. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins 46:405–415.CrossRefGoogle Scholar
  28. Gibrat, J.-F., T. Madej, and S. Bryant. 1996. Surprising similarites in structure comparison. Curr. Opin. Struct. Biol. 6:377–385.CrossRefGoogle Scholar
  29. Godzik, A. 1996. The structural alignment between two proteins: Is there a unique answer? Protein Sci. 5:1325–1338.CrossRefGoogle Scholar
  30. Goldman, D., C. H. Papadimitriou, and S. Istrail. 1999. Algorithmic aspects of protein structure similarity. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 512, Washington, DC. IEEE Computer Society. ISBN 0-7695-0409-4.Google Scholar
  31. Gough, J. 2002. The SUPERFAMILY database in structural genomics. Acta Crystallogr. D58:1897–1900.Google Scholar
  32. Guda, C., E. D. Scheeff, P. E. Bourne, and N. Shindyalov. 2001. A new algoritm for the alignment of multiple protein structures using Monte Carlo optimization. In PSB.Google Scholar
  33. Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. London, Cambridge University Press. ISBN 0-521-58519-8 (hardcover).MATHCrossRefGoogle Scholar
  34. Holm, L., and C. Sander. 1993. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233:123–138.CrossRefGoogle Scholar
  35. Holm, L., and C. Sander. 1995. 3-D lookup: Fast protein structure database searches at 90% reliability. In ISMB, pp. 179–187.Google Scholar
  36. Holm, L., and C. Sander. 1996. Mapping the protein universe. Science 273:595– 602.CrossRefADSGoogle Scholar
  37. Hughey, R., and A. Krogh. 1995. SAM: Sequence alignment and modeling software system. Technical Report, University of California at Santa Cruz.Google Scholar
  38. Irving, J. A., J. C. Whisstock, and A. M. Lesk. 2001. Protein structural alignments and functional genomics. Proteins 42:378–382.CrossRefGoogle Scholar
  39. Jia, Y., T. G. Dewey, I. N. Shindyalov, and P. E. Bourne. 2004. A new scoring function and associated statistical significance for structure alignment by CE. J. Comput. Biol. 11:787–799.CrossRefGoogle Scholar
  40. Jonassen, I., I. Eidhammer, and W. R. Taylor. 1999. Discovery of local packing motifs in protein structures. Proteins 34:206–219.CrossRefGoogle Scholar
  41. Kabsch, W. 1978. A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A34:827–828.ADSGoogle Scholar
  42. Kato, H., and Y. Takahashi. 2001. Automated identification of three-dimensional common structural features of proteins. J. Chem. Software 7:161–170.CrossRefGoogle Scholar
  43. Kim, D. E., D. Chivian, and D. Baker. 2004. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32:526–531.CrossRefGoogle Scholar
  44. Kolodny, R., and N. Linial. 2004. From The Cover: Approximate protein structural alignment in polynomial time. Proc. Natl. Acad. Sci. USA 101:12201–12206. URL Scholar
  45. Lathrop, R. H. 1994. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7:1059–1068.CrossRefGoogle Scholar
  46. Leibowitz, N., Z. Y. Fligelman, R. Nussinov, and H. J. Wolfson. 2001. Automated structure alignment and detection of a common substructural motif. Proteins 2001:235–245.CrossRefGoogle Scholar
  47. Levitt, M., and M. Gerstein. 1998. A unified statistical framework for sequence comparison and structure comparison. Proc. Natl. Acad. Sci. USA 95:5913–5920, URL Scholar
  48. Lindahl, E., and A. Eloffson. 2000. Identification of related proteins on family, superfamily and fold level. J. Mol. Biol. 295:613–625.CrossRefGoogle Scholar
  49. Lundstrom, J., L. Rychlewski, J. Bujnicki, and A. Elofsson. 2001. Pcons: A neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10:2354–2362.CrossRefGoogle Scholar
  50. Madej, T., J.-F. Gibrat, and S. H. Bryant. 1995. Threading a database of protein cores. Proteins 23:356–369.CrossRefGoogle Scholar
  51. Meir, R., and G. Ratsch. 2003. An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning. S. Mendelson and A. Smola (Eds.). Berlin, Springer-Verlag, pp. 119–184.Google Scholar
  52. Murzin, A. G., S. E. Brenner, T. Hubbard, and C. Chothia. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536–540.CrossRefGoogle Scholar
  53. Needleman, S., and C. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443–53.CrossRefGoogle Scholar
  54. Novotny, M., D. Madsen, and G. J. Kleywegt. 2004. Evaluation of protein fold comparison servers. Proteins Struct. Funct. Bioinform. 54:260–270.CrossRefGoogle Scholar
  55. Nussinov, R., and H. Wolfson. 1991. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc. Nat. Acad. Sci. USA 88:10495–10499.CrossRefADSGoogle Scholar
  56. Orengo, C., and W. Taylor. 1996. SSAP: Sequential structure alignment program for protein structure comparison. Methods Enzymol. 266:617–635.CrossRefGoogle Scholar
  57. Orengo, C. A., A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M. Thorton. 1997. CATH–A hierarchic classification of protein domain structures. Structure 5:1093–1108.CrossRefGoogle Scholar
  58. Pennec, X., and N. Ayache. 1998. A geometric algorithm to find small but highly similar 3D substructures in proteins. Bioinformatics 14:516–522.CrossRefGoogle Scholar
  59. Pieper, U., N. Eswar, A. C. Stuart, V. A. Ilyin, and A. Sali. 1999. MODBASE, a database of annotated comparative protein structure models. Bioinformatics 15:1060–1061.CrossRefGoogle Scholar
  60. Portugaly, E., and M. Linial. 2000. Estimating the probability for a protein to have a new fold: A statistical computational model. Proc. Natl. Acad. Sci. USA 97:5161–5166.CrossRefADSGoogle Scholar
  61. Rose, R. B., and R. M. Stroud. 1998. Domain flexibility in retroviral proteases: Structural implications for drug resistant mutations. Biochemistry 37:2607–2621.CrossRefGoogle Scholar
  62. Sali, A., and T. Blundell. 1990. Definition of general topological equivalence in protein structures: A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212:403–428.CrossRefGoogle Scholar
  63. Schapire, R. E., and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37:297–336.MATHCrossRefGoogle Scholar
  64. Shatsky, M. 2004. Flexprot: Alignment of flexible protein structures without a predefinition of hinge regions. J. Comput. Biol. 11:83–106.CrossRefGoogle Scholar
  65. Shatsky, M., R. Nussinov, and H. Wolfson. 2002. Flexible protein alignment and hinge detection. Proteins 48:242–256.CrossRefGoogle Scholar
  66. Shindyalov, I. N., and P. E. Bourne. 1998. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11:739–747.CrossRefGoogle Scholar
  67. Shindyalov, I. N., and P. E. Bourne. 2000. An alternative view of the protein fold space. Proteins 38:247–260.CrossRefGoogle Scholar
  68. Sierk, M. L., and W. R. Pearson. 2004. Sensitivity and selectivity in protein structure comparison. Protein Sci. 13:773–785. URL http://www.proteinscience. org/cgi/content/abstract/13/3/773.CrossRefGoogle Scholar
  69. Singh, A., and D. Brutlag. 1997. Hierarchical protein structure superposition using both secondary structure and atomic representations. In ISMB, pp. 284–293. ISBN 1-57735-022-7.Google Scholar
  70. Singh, R., and M. Saha. 2003. Identifying structural motifs in proteins. In Pac. Symp. Biocomput. Google Scholar
  71. Taylor, W. R. 1999. Protein structure comparison using iterated double dynamic programming. Protein Sci. 8:654–665.CrossRefGoogle Scholar
  72. Verbitsky, G., R. Nussinov, and H. Wolfson. 1999. Flexible structural comparison allowing hinge-bending, swiveling motions. Proteins 34:232–254.CrossRefGoogle Scholar
  73. Wallace, A. C., N. Borkakoti, and J. M. Thorton. 1997. TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. application to enzyme active sites. Protein Sci. 6:2308–2323.CrossRefGoogle Scholar
  74. Wang, Y., J. B. Anderson, J. Chen, L. Y. Geer, S. He, D. I. Hurwitz, C. A. Liebert, T. Madej, G. H. Marchler, A. Marchler-Bauer, A. R. Panchenko, B. A. Shoemaker, J. S. Song, P. A. Thiessen, R. A. Yamashita, and S. H. Bryant. 2002. MMDB: Entrez's 3D-structure database. Nucleic Acids Res. 30:249–252.CrossRefGoogle Scholar
  75. Ye, Y., and A. Godzik. 2003. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19:ii 246–255.Google Scholar
  76. Ye, Y., and A. Godzik. 2004. Database searching by flexible protein structure alignment. Protein Sci. 13:1841–1850. URL http://www.proteinscience. org/cgi/content/abstract/13/7/1841.CrossRefGoogle Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  • Orhan Çamoğlu
  • Ambuj K. Singh

There are no affiliations available

Personalised recommendations