Skip to main content

The Protein Universes

Some Informatic Issues in Protein Classification

  • Chapter
Power Laws, Scale-Free Networks and Genome Biology

Part of the book series: Molecular Biology Intelligence Unit ((MBIU))

  • 948 Accesses

Abstract

We discuss some informatic problems in protein classification. We first address a neglected problem in sequence classification-information loss resulting from alphabet contraction. Since the use of reduced alphabets is a standard bioinformatic tool, this is a significant issue. We review recent work in which it was shown that information theoretic methods can be used to quantitate the amount of structural information carried by a specified sequence representation. These tools are then used to construct reduced alphabets of specified size which retain the maximum possible amount of structural information. We then turn to structure classification. After briefly reviewing previous work in this field, we discuss the fact that sequence and structure classification give different pictures of the protein space. We outline ongoing research in which new parameters are sought which explicitly encode architecture choice by protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rackovsky S. “Hidden” Sequence periodicities and protein Architecture. Proc Nat Acad Sci USA 1998;95:8580–8584.

    Article  ADS  Google Scholar 

  2. M Gerstein. A structural census of genomes: Comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol 1997;274:562–576.

    Article  Google Scholar 

  3. Rackovsky S. Quantitative organization of the known protein X-ray structures. I. Methods and short length-scale results. Proteins: Structure, Function and Genetics 1990;7:378–402.

    Article  Google Scholar 

  4. Yee DP, Dill KA. Families and the structural relatedness among globular proteins. Prot Sci 1993;2:884–899.

    Article  Google Scholar 

  5. HOU J, Sims GE, Shang C et al. A global representation of the protein fold space. Proc Nat Acad Sci USA 2003;100:2386–2390.

    Article  ADS  Google Scholar 

  6. Holm L, Sander C. Dali/FSSP classification of three-dimensional protein folds. Nucleic Acid Res 1997;25:231–234.

    Article  Google Scholar 

  7. Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science 1992;256:1443–1445.

    Article  ADS  Google Scholar 

  8. Linial M, Linial N, Tishby N et al. Global self-organization of all known protein sequences reveals inherent biological signatures. J Mol Biol 1997;268:539–556.

    Article  Google Scholar 

  9. Gracy J, Argos P. Automated protein sequence database classification. I. Integration of compositional similarity search, Local similarity search, and multiple sequence alignment. Bioinformatics 1998;14:164–173.

    Article  Google Scholar 

  10. Wang H-C, Dopazo J, De La Fraga LG et al. Self-organizing tree-growing network for the classification of protein sequences. Prot Sci 1998;7:2613–2622.

    Article  Google Scholar 

  11. Yona G, Linial N, Linial M. Proto Map: Automatic classification of protein sequences, A heirarchy of protein families, and local maps of the protein space. Proteins: Structure Function and Genetics 1999;37:360–378.

    Article  Google Scholar 

  12. Dokholyan NV, Shakhnovich B, Shakhnovich EI. Expanding protein universe and its origin from the biological big bang. Proc Nat Acad Sci USA 2002;99:14132–14136.

    Article  ADS  Google Scholar 

  13. Albert R, Barabási A-L. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47–97.

    Article  ADS  Google Scholar 

  14. Myers EW. Seeing conserved signals: Using algorithms to detect similarities between biosequences. In: Lander ES, Waterman MS, eds. Calculating the Secrets of Life. Washington, DC: National Academy Press, 1995.

    Google Scholar 

  15. Barton GJ. Protein sequence alignment techniques. Acta Cryst 1998;D54:1139–1146.

    Google Scholar 

  16. Smith TF. The art of matchmaking: Sequence alignment methods and their structural implications. Structure 1999;7:R7–R12.

    Article  Google Scholar 

  17. Dayhoff MO, Eck RV. Atlas of Protein Sequence and Structure. Silver Spring, MD: NBRF Press, 1996:2.

    Google Scholar 

  18. Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. Proc Nat Acad Sci USA 1992;89:10915–10919.

    Article  ADS  Google Scholar 

  19. Altschul SF. A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 1993;36:290–300.

    Article  Google Scholar 

  20. Naor D, Fischer D, Jernigan RL et al. Amino acid pair interchanges at spatially conserved locations. J Mol Biol 1996;256:924–938.

    Article  Google Scholar 

  21. Russell RB, Saqi MAS, Sayle RA et al. Recognition of analogous and homologous protein folds: Analysis of sequence and structure conservation. J Mol Biol 1997;269:423–439.

    Article  Google Scholar 

  22. Johnson MS, Overington JP. A structural basis for sequence comparison: An evaluation of scoring methodologies. J Mol Biol 1993;233:716–738.

    Article  Google Scholar 

  23. Prlic A, Domingues FS, Sippl MJ. Structurederived substitution matrices for alignment of distantly related sequences. Protein Engineering 2000;13:545–550.

    Article  Google Scholar 

  24. Blake JD, Cohen FE. Pairwise sequence alignment below the twilight zone. J Mol Biol 2001;307:721–735.

    Article  Google Scholar 

  25. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Evol 1981;147:195–197.

    Google Scholar 

  26. Altschul SF. Generalized affine gap costs for protein sequence alignment. Proteins: Structure Function and Genetics 1998;32:88–96.

    Article  Google Scholar 

  27. Argos P, Vingron M, Vogt G. Protein sequence comparisons: Methods and significance. Protein Eng 1991;4:375–383.

    Article  Google Scholar 

  28. Saqi M, Sternberg M. A simple method to generate Nontrivial alternate alignments of protein sequences. J Mol Biol 1991;219:727–732.

    Article  Google Scholar 

  29. Zuker M. Suboptimal sequence alignment in molecular biology: Alignment with error analysis. J Mol Biol 1991;221:403–420.

    Article  Google Scholar 

  30. Agarwal P, States D. A bayesian evolutionary distance for parametrically aligned sequences. J Comput Biol 1996;3:1–17.

    Article  Google Scholar 

  31. Vingron M. Near-optimal sequence alignment. Curr Opin in Struct Biol 1996;6:346–352.

    Article  Google Scholar 

  32. Horowitz E, Sahni S. Fundamentals of Computer Algorithms. New York, NY: Computer Science Press, 1978:198–247.

    MATH  Google Scholar 

  33. Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Nat Acad Sci USA 1988;85:2444–2448.

    Article  ADS  Google Scholar 

  34. Altschul S, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol 1990;215:403–410.

    Google Scholar 

  35. Krogh A, Brown M, Mian J et al. Hidden markov models in computational biology: Applications to protein modeling. J Mol Biol 1994;235:1501–1531.

    Article  Google Scholar 

  36. Eddy S. Hidden markov models. Curr Opin Struct Biol 1996;6:361–365.

    Article  MathSciNet  Google Scholar 

  37. Bucher P, Hoffman K. A sequence similarity algorithm based on a probabilistic interpretation of an alignment scoring system. In: States D, Gaasterland T, Hunter L, Smith R, eds. ISMB-4. Menlo Park: AAAI Press, 1996.

    Google Scholar 

  38. Lipman DJ, Altschul SF, Kececioglu J. A tool for multiple sequence alignment. Proc Nat Acad Sci USA 1989;86:4412–4415.

    Article  ADS  Google Scholar 

  39. Notredame C, Higgins DG. SAGA: Sequence Alignment by Genetic Algorithm. Nucl Acids Res 1996;24:1515–1524.

    Article  Google Scholar 

  40. Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Nat Acad Sci USA 1998;95:6073–6078.

    Article  ADS  Google Scholar 

  41. Sauder JM, Arthur JW, Dunbrack Jr RL. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins: Structure, Function and Genetics 2000;40:6–22.

    Article  Google Scholar 

  42. Panchenko AR, Bryant SH. A comparison of position-specific score matrices based on sequence and structure alignments. Prot Sci 2002;11:361–370.

    Article  Google Scholar 

  43. Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Nat Acad Sci USA 1986;83:5155–5159.

    Article  MATH  ADS  Google Scholar 

  44. Blaisdell BE. Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems. J Mol Evol 1991;32:521–528.

    Article  Google Scholar 

  45. Yona G, Levitt M. A unified sequence-structure classification of protein sequences: Combining sequence and structure in a map of the protein space. Tokyo: Proceedings of the Fourth Annual Conference on Computational Molecular Biology, 2000:308–317.

    Google Scholar 

  46. Solis AD, Rackovsky S. Optimized representations and maximal information in proteins. Proteins: Structure Function and Genetics 2000;38:149–164.

    Article  Google Scholar 

  47. Solis AD, Rackovsky S. Optimally informative backbone structural propensities in proteins. Proteins: Structure Function and Genetics 2002;48:463–486.

    Article  Google Scholar 

  48. Solis AD. Structural information from local sequence of proteins and DNA. Thesis, Mt. Sinai School of Medicine of New York University 2002;148–191.

    Google Scholar 

  49. Kuznetsov IB, Solis AD, Rackovsky S. (work in progress).

    Google Scholar 

  50. Brown NP, Orengo CP, Taylor WR. A protein structure comparison methodology. Computers Chem 1996;20:359–380.

    Article  Google Scholar 

  51. Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins: Structure Function and Genetics 2003;50:144–157.

    Article  Google Scholar 

  52. Godzik A. The structural alignment between two proteins: Is there a unique answer? Prot Sci 1996;5:1325–1338.

    Article  Google Scholar 

  53. Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. I. On the comparison of polymer conformations. Macromolecules 1978;11:1168–1174.

    Article  ADS  Google Scholar 

  54. Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. II. Mathematical considerations and a conformational distance function. Macromolecules 1980;13:1440–1453.

    Article  ADS  Google Scholar 

  55. Rackovsky S, Scheraga HA. Intermolecular anti-parallel beta sheet: Comparison of predicted and observed conformations of gramicidin S. Proc Nat Acad Science USA 1980;77:6965–6967.

    Article  ADS  Google Scholar 

  56. Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. III. Nearest-neighbor correlations and medium-range structure. Macromolecules 1981;14:1259–1269.

    Article  ADS  Google Scholar 

  57. Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. IV. Conformational and nucleation properties of individual amino acids. Macromolecules 1982;15:1340–1346.

    Article  ADS  Google Scholar 

  58. Rackovsky S, Scheraga HA. Differential geometry and protein folding. Accounts of Chemical Research 1984;17:209–214.

    Article  Google Scholar 

  59. Rackovsky S, Goldstein DA. Differential geometry and protein conformation. V. Medium-range conformational influence of the individual amino acids. Biopolymers 1987;26:1163–1187.

    Article  Google Scholar 

  60. Rackovsky S, Goldstein DA. Protein comparison and classification: A differential geometric approach. Proc Natl Acad Sci USA 1988;85:777–781.

    Article  ADS  Google Scholar 

  61. Pevzner P. Personal communication.

    Google Scholar 

  62. Rackovsky S. Quantitative classification of the known protein X-ray structures. Polymer Preprints 1990;31:205.

    Google Scholar 

  63. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993;223:123–138.

    Article  Google Scholar 

  64. Holm L, Sander C. Mapping the protein universe. Science 1996;273:595–602.

    Article  ADS  Google Scholar 

  65. Brenner SE, Chothia C, Hubbard TJP. Population statistics of protein structures: Lessons from structural classifications. Curr Opin Struct Biol 1997; 7:369–376.

    Article  Google Scholar 

  66. Fischer D, Tsai C-J, Nussinov R et al. A 3D sequence-independent representation of the protein data bank. Protein Engineering 1995; 8:981–997.

    Article  Google Scholar 

  67. Leibowitz N, Fligelman Z, Nussinov R et al. Automated multiple structure alignment and detection of a common motif. Proteins: Structure Function and Genetics 2001;43:235–245.

    Article  Google Scholar 

  68. Dror O, Benyamini H, Nussinov R et al. MASS: Multiple structure alignment by secondary structures. Bioinformatics 2003; 19(Suppl.1):i95–i104.

    Article  Google Scholar 

  69. Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure com-parison. Proc Nat Acad Sci USA 1998;95:5913–5920.

    Article  ADS  Google Scholar 

  70. Qian J, Luscombe NM, Gerstein M. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J Mol Biol 2001;313:673–681.

    Article  Google Scholar 

  71. Kuznetsov VA. In: Zhang W, Shmulevich I, eds. Computational and Statistical Approaches to Genomics. Boston: Kluwer, 2002:125–171.

    Google Scholar 

  72. Karev GP, Wolf YI, Rzhetsky AY et al. In: Galperin MY, Koonin EV, eds. Amsterdam, Horizon: Computational Genomics From Sequence to Function 2003:261–314.

    Google Scholar 

  73. Yanai I, Camacho C, DeLisi C. Predictions of gene family distributions in microbial genomes: Evolution by gene duplication and modification. Phys Rev Lett 2000;85:2641–2644.

    Article  ADS  Google Scholar 

  74. Kidera A, Konishi Y, Oka M et al. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Prot Chem 1985;4:23–55.

    Article  Google Scholar 

  75. Kidera A, Konishi Y, Ooi T et al. Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids. J Prot Chem 1985; 4:265–297.

    Article  Google Scholar 

  76. Rackovsky S. work in progress.

    Google Scholar 

  77. Yang A-S, Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. J Mol Biol 2000;301:679–689.

    Article  Google Scholar 

  78. Alm E, Baker D. Matching theory and experiment in protein folding. Curr Opin Struct Biol 1999;9:189–196.

    Article  Google Scholar 

  79. Shea JE, Onuchic JN, Brooks IIIrd CL. Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc Nat Acad Sci USA 1999;96:12512–12517.

    Article  ADS  Google Scholar 

  80. Onuchic JN, Nymeyer H, Garcia AE et al. The energy landscape theory of protein folding: In-sights in folding mechanism and scenarios. Adv Prot Chem 2000;53:87–152.

    Article  Google Scholar 

  81. Micheletti C, Banavar JR, Maritan A et al. Protein structures and optimal folding from a geometrical variational principle. Phys Rev Lett 1999;82:3372–3375.

    Article  ADS  Google Scholar 

  82. Abkevich V, Gutin A, Shakhnovich E. Specific nucleus as the transition state for protein folding: Evidence from the lattice model. Biochemistry 1994;33:10026–10036.

    Article  Google Scholar 

  83. Baldwin RL. Folding concensus? Nature Struct Biol 2001;8:92–94.

    Article  Google Scholar 

  84. Fersht AR. Transition-state structure as a unifying basis in protein-folding mechanisms: Contact order, chain topology, stability, and the extended nucleus mechanism. Proc Natl Acad Sci USA 2000;97:1525–1529.

    Article  ADS  Google Scholar 

  85. Burns LL, Dalessio PIM, Ropson IJ. Folding Mechanism of three structurally similar β-Sheet Proteins. PROTEINS: Structure, Function and Genetics 1998;33:107–188.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Eurekah.com and Springer Science+Business Media

About this chapter

Cite this chapter

Rackovsky, S. (2006). The Protein Universes. In: Power Laws, Scale-Free Networks and Genome Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-33916-7_11

Download citation

Publish with us

Policies and ethics