Abstract
We discuss some informatic problems in protein classification. We first address a neglected problem in sequence classification-information loss resulting from alphabet contraction. Since the use of reduced alphabets is a standard bioinformatic tool, this is a significant issue. We review recent work in which it was shown that information theoretic methods can be used to quantitate the amount of structural information carried by a specified sequence representation. These tools are then used to construct reduced alphabets of specified size which retain the maximum possible amount of structural information. We then turn to structure classification. After briefly reviewing previous work in this field, we discuss the fact that sequence and structure classification give different pictures of the protein space. We outline ongoing research in which new parameters are sought which explicitly encode architecture choice by protein sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rackovsky S. “Hidden” Sequence periodicities and protein Architecture. Proc Nat Acad Sci USA 1998;95:8580–8584.
M Gerstein. A structural census of genomes: Comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol 1997;274:562–576.
Rackovsky S. Quantitative organization of the known protein X-ray structures. I. Methods and short length-scale results. Proteins: Structure, Function and Genetics 1990;7:378–402.
Yee DP, Dill KA. Families and the structural relatedness among globular proteins. Prot Sci 1993;2:884–899.
HOU J, Sims GE, Shang C et al. A global representation of the protein fold space. Proc Nat Acad Sci USA 2003;100:2386–2390.
Holm L, Sander C. Dali/FSSP classification of three-dimensional protein folds. Nucleic Acid Res 1997;25:231–234.
Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science 1992;256:1443–1445.
Linial M, Linial N, Tishby N et al. Global self-organization of all known protein sequences reveals inherent biological signatures. J Mol Biol 1997;268:539–556.
Gracy J, Argos P. Automated protein sequence database classification. I. Integration of compositional similarity search, Local similarity search, and multiple sequence alignment. Bioinformatics 1998;14:164–173.
Wang H-C, Dopazo J, De La Fraga LG et al. Self-organizing tree-growing network for the classification of protein sequences. Prot Sci 1998;7:2613–2622.
Yona G, Linial N, Linial M. Proto Map: Automatic classification of protein sequences, A heirarchy of protein families, and local maps of the protein space. Proteins: Structure Function and Genetics 1999;37:360–378.
Dokholyan NV, Shakhnovich B, Shakhnovich EI. Expanding protein universe and its origin from the biological big bang. Proc Nat Acad Sci USA 2002;99:14132–14136.
Albert R, Barabási A-L. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47–97.
Myers EW. Seeing conserved signals: Using algorithms to detect similarities between biosequences. In: Lander ES, Waterman MS, eds. Calculating the Secrets of Life. Washington, DC: National Academy Press, 1995.
Barton GJ. Protein sequence alignment techniques. Acta Cryst 1998;D54:1139–1146.
Smith TF. The art of matchmaking: Sequence alignment methods and their structural implications. Structure 1999;7:R7–R12.
Dayhoff MO, Eck RV. Atlas of Protein Sequence and Structure. Silver Spring, MD: NBRF Press, 1996:2.
Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. Proc Nat Acad Sci USA 1992;89:10915–10919.
Altschul SF. A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 1993;36:290–300.
Naor D, Fischer D, Jernigan RL et al. Amino acid pair interchanges at spatially conserved locations. J Mol Biol 1996;256:924–938.
Russell RB, Saqi MAS, Sayle RA et al. Recognition of analogous and homologous protein folds: Analysis of sequence and structure conservation. J Mol Biol 1997;269:423–439.
Johnson MS, Overington JP. A structural basis for sequence comparison: An evaluation of scoring methodologies. J Mol Biol 1993;233:716–738.
Prlic A, Domingues FS, Sippl MJ. Structurederived substitution matrices for alignment of distantly related sequences. Protein Engineering 2000;13:545–550.
Blake JD, Cohen FE. Pairwise sequence alignment below the twilight zone. J Mol Biol 2001;307:721–735.
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Evol 1981;147:195–197.
Altschul SF. Generalized affine gap costs for protein sequence alignment. Proteins: Structure Function and Genetics 1998;32:88–96.
Argos P, Vingron M, Vogt G. Protein sequence comparisons: Methods and significance. Protein Eng 1991;4:375–383.
Saqi M, Sternberg M. A simple method to generate Nontrivial alternate alignments of protein sequences. J Mol Biol 1991;219:727–732.
Zuker M. Suboptimal sequence alignment in molecular biology: Alignment with error analysis. J Mol Biol 1991;221:403–420.
Agarwal P, States D. A bayesian evolutionary distance for parametrically aligned sequences. J Comput Biol 1996;3:1–17.
Vingron M. Near-optimal sequence alignment. Curr Opin in Struct Biol 1996;6:346–352.
Horowitz E, Sahni S. Fundamentals of Computer Algorithms. New York, NY: Computer Science Press, 1978:198–247.
Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Nat Acad Sci USA 1988;85:2444–2448.
Altschul S, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol 1990;215:403–410.
Krogh A, Brown M, Mian J et al. Hidden markov models in computational biology: Applications to protein modeling. J Mol Biol 1994;235:1501–1531.
Eddy S. Hidden markov models. Curr Opin Struct Biol 1996;6:361–365.
Bucher P, Hoffman K. A sequence similarity algorithm based on a probabilistic interpretation of an alignment scoring system. In: States D, Gaasterland T, Hunter L, Smith R, eds. ISMB-4. Menlo Park: AAAI Press, 1996.
Lipman DJ, Altschul SF, Kececioglu J. A tool for multiple sequence alignment. Proc Nat Acad Sci USA 1989;86:4412–4415.
Notredame C, Higgins DG. SAGA: Sequence Alignment by Genetic Algorithm. Nucl Acids Res 1996;24:1515–1524.
Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Nat Acad Sci USA 1998;95:6073–6078.
Sauder JM, Arthur JW, Dunbrack Jr RL. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins: Structure, Function and Genetics 2000;40:6–22.
Panchenko AR, Bryant SH. A comparison of position-specific score matrices based on sequence and structure alignments. Prot Sci 2002;11:361–370.
Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Nat Acad Sci USA 1986;83:5155–5159.
Blaisdell BE. Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems. J Mol Evol 1991;32:521–528.
Yona G, Levitt M. A unified sequence-structure classification of protein sequences: Combining sequence and structure in a map of the protein space. Tokyo: Proceedings of the Fourth Annual Conference on Computational Molecular Biology, 2000:308–317.
Solis AD, Rackovsky S. Optimized representations and maximal information in proteins. Proteins: Structure Function and Genetics 2000;38:149–164.
Solis AD, Rackovsky S. Optimally informative backbone structural propensities in proteins. Proteins: Structure Function and Genetics 2002;48:463–486.
Solis AD. Structural information from local sequence of proteins and DNA. Thesis, Mt. Sinai School of Medicine of New York University 2002;148–191.
Kuznetsov IB, Solis AD, Rackovsky S. (work in progress).
Brown NP, Orengo CP, Taylor WR. A protein structure comparison methodology. Computers Chem 1996;20:359–380.
Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins: Structure Function and Genetics 2003;50:144–157.
Godzik A. The structural alignment between two proteins: Is there a unique answer? Prot Sci 1996;5:1325–1338.
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. I. On the comparison of polymer conformations. Macromolecules 1978;11:1168–1174.
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. II. Mathematical considerations and a conformational distance function. Macromolecules 1980;13:1440–1453.
Rackovsky S, Scheraga HA. Intermolecular anti-parallel beta sheet: Comparison of predicted and observed conformations of gramicidin S. Proc Nat Acad Science USA 1980;77:6965–6967.
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. III. Nearest-neighbor correlations and medium-range structure. Macromolecules 1981;14:1259–1269.
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. IV. Conformational and nucleation properties of individual amino acids. Macromolecules 1982;15:1340–1346.
Rackovsky S, Scheraga HA. Differential geometry and protein folding. Accounts of Chemical Research 1984;17:209–214.
Rackovsky S, Goldstein DA. Differential geometry and protein conformation. V. Medium-range conformational influence of the individual amino acids. Biopolymers 1987;26:1163–1187.
Rackovsky S, Goldstein DA. Protein comparison and classification: A differential geometric approach. Proc Natl Acad Sci USA 1988;85:777–781.
Pevzner P. Personal communication.
Rackovsky S. Quantitative classification of the known protein X-ray structures. Polymer Preprints 1990;31:205.
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993;223:123–138.
Holm L, Sander C. Mapping the protein universe. Science 1996;273:595–602.
Brenner SE, Chothia C, Hubbard TJP. Population statistics of protein structures: Lessons from structural classifications. Curr Opin Struct Biol 1997; 7:369–376.
Fischer D, Tsai C-J, Nussinov R et al. A 3D sequence-independent representation of the protein data bank. Protein Engineering 1995; 8:981–997.
Leibowitz N, Fligelman Z, Nussinov R et al. Automated multiple structure alignment and detection of a common motif. Proteins: Structure Function and Genetics 2001;43:235–245.
Dror O, Benyamini H, Nussinov R et al. MASS: Multiple structure alignment by secondary structures. Bioinformatics 2003; 19(Suppl.1):i95–i104.
Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure com-parison. Proc Nat Acad Sci USA 1998;95:5913–5920.
Qian J, Luscombe NM, Gerstein M. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J Mol Biol 2001;313:673–681.
Kuznetsov VA. In: Zhang W, Shmulevich I, eds. Computational and Statistical Approaches to Genomics. Boston: Kluwer, 2002:125–171.
Karev GP, Wolf YI, Rzhetsky AY et al. In: Galperin MY, Koonin EV, eds. Amsterdam, Horizon: Computational Genomics From Sequence to Function 2003:261–314.
Yanai I, Camacho C, DeLisi C. Predictions of gene family distributions in microbial genomes: Evolution by gene duplication and modification. Phys Rev Lett 2000;85:2641–2644.
Kidera A, Konishi Y, Oka M et al. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Prot Chem 1985;4:23–55.
Kidera A, Konishi Y, Ooi T et al. Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids. J Prot Chem 1985; 4:265–297.
Rackovsky S. work in progress.
Yang A-S, Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. J Mol Biol 2000;301:679–689.
Alm E, Baker D. Matching theory and experiment in protein folding. Curr Opin Struct Biol 1999;9:189–196.
Shea JE, Onuchic JN, Brooks IIIrd CL. Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc Nat Acad Sci USA 1999;96:12512–12517.
Onuchic JN, Nymeyer H, Garcia AE et al. The energy landscape theory of protein folding: In-sights in folding mechanism and scenarios. Adv Prot Chem 2000;53:87–152.
Micheletti C, Banavar JR, Maritan A et al. Protein structures and optimal folding from a geometrical variational principle. Phys Rev Lett 1999;82:3372–3375.
Abkevich V, Gutin A, Shakhnovich E. Specific nucleus as the transition state for protein folding: Evidence from the lattice model. Biochemistry 1994;33:10026–10036.
Baldwin RL. Folding concensus? Nature Struct Biol 2001;8:92–94.
Fersht AR. Transition-state structure as a unifying basis in protein-folding mechanisms: Contact order, chain topology, stability, and the extended nucleus mechanism. Proc Natl Acad Sci USA 2000;97:1525–1529.
Burns LL, Dalessio PIM, Ropson IJ. Folding Mechanism of three structurally similar β-Sheet Proteins. PROTEINS: Structure, Function and Genetics 1998;33:107–188.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2006 Eurekah.com and Springer Science+Business Media
About this chapter
Cite this chapter
Rackovsky, S. (2006). The Protein Universes. In: Power Laws, Scale-Free Networks and Genome Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-33916-7_11
Download citation
DOI: https://doi.org/10.1007/0-387-33916-7_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-25883-6
Online ISBN: 978-0-387-33916-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)